yhydhx / python-nameparser

Automatically exported from code.google.com/p/python-nameparser
Other
0 stars 0 forks source link

Nickname #33

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Could you add a nickname name part? 

I am building an elections results scraper, and I frequently encounter cases 
like these:

Charles "Chuck" Wilson
Robert (Bud) McGee

In the former case, the nickname is parsed as a middle name. In the latter 
case, the nickname is completely ignored.

Original issue reported on code.google.com by kirkma...@gmail.com on 31 Mar 2014 at 9:31

GoogleCodeExporter commented 9 years ago
Seems possible. 

Parenthesis also can occur in contact import situations, e.g. "John Doe (Google 
Docs)" as noted in issue 17 below. Currently their contents are just dropped 
out, but we could stick it in a nickname part. It may not always be a nickname 
though. Might be nice to have an optional parameter to provide a set() of names 
like "Google Docs" to ignore as nicknames.

Quotation marks are also completely ignored and end up as part of the name. I 
wonder how safe the assumption is that any name in quotes is a nickname? 
Thoughts? Do single quotes (') appear in names? Maybe, any name piece with 
matching quote marks on both ends? If you happen to know the regex for that, 
you'd make me happy. :)

https://code.google.com/p/python-nameparser/issues/detail?id=17

Original comment by dere...@gmail.com on 1 Apr 2014 at 11:31

GoogleCodeExporter commented 9 years ago
After looking into this a bit more I think it will mostly work out.

I don't know if it's worth handling single quotes around nicknames. I guess 
single quotes/apostrophes are somewhat common in names, e.g. Jeff O'Connor, or 
sometimes to represent non-ascii letters, e.g. Mari' Abue'. My current 
implementation doesn't do anything with single quotes. I't would be possible to 
do it at the name piece level. Let me know if you think it's worth it.

Also just to note, I'll treat nicknames the same as titles so they are not used 
in the equals test, since the presence or absence of a nickname doesn't 
indicate a different person.

Original comment by dere...@gmail.com on 2 Apr 2014 at 1:52

GoogleCodeExporter commented 9 years ago
I committed these changes. If you can take a look before I push a new version 
to pypi and let me know if there are any problems with your real world data 
that'd be helpful.

pip install -e hg+http://code.google.com/p/python-nameparser/#egg=nameparser

Original comment by dere...@gmail.com on 2 Apr 2014 at 2:24

GoogleCodeExporter commented 9 years ago
I went ahead and pushed out the new revision that includes the nickname, 
v0.2.9. 

I am also moving the repository over to GitHub so if you're installing the dev 
version be sure to move your requirements over there, e.g: 

pip install -e git+git://github.com/derek73/python-nameparser.git#egg=nameparser

Original comment by dere...@gmail.com on 3 Apr 2014 at 12:19