joshfraser / JavaScript-Name-Parser

JavaScript code to split names into their respective components (first, last, etc)
http://www.onlineaspect.com/2009/08/17/splitting-names/
111 stars 34 forks source link

Test suite #5

Closed fulldecent closed 8 years ago

fulldecent commented 10 years ago

Google contacts has EXCELLENT name parsing for all languages.

https://www.google.com/contacts/#contacts

"API" at: https://clients6.google.com/plusi/v2/ozInternal/contactstoremutate?key=AIzaSyBuUpn1wi2-0JpM3S-tq2csYx0z2_m_pqc&alt=json

To illustrate: it knows that 诸葛亮 is last name 诸葛 and first name 亮, but it also knows that 柏夫人 is last name 柏 first name 夫人. This is done without language hinting, and it even recognizes the difference between Chinese and Japanese names, which could even use the same characters.


Although your library does not support it today, I request to add these and other examples to the test suite. The will fail, but it will demonstrate the scope and limits of this library.

joshfraser commented 10 years ago

That link to their API isn't working for me. Do you have a working link? I'd love to check it out.

fulldecent commented 10 years ago

It is not really an "API" but if you load https://www.google.com/contacts/#contacts and sniff the network you can see how posting requests allow you to parse the name.

Key AIzaSyBuUpn1wi2-0JpM3S-tq2csYx0z2_m_pqc is one of my contacts but you'll need to use one of your contacts that you have permission to edit. Basically you send a "Update name to XXX" request and it immediately sends back "{Firstname: XXX, Lastname: YYY, ...}"

joshfraser commented 10 years ago

Oh got it. Thanks for the explanation and wow, that is quite impressive. I'm guessing they're using machine learning from the corpus of names they have across Google services. Have you tried wrapping their API for external use? Seems like that might be a valuable exercise for anyone looking for a more advanced solution.

fulldecent commented 8 years ago

closing, out of scope