Name errors in co-author network analysis

Generating a network of co-authors from names in the ESA program is problematic because an individual's name may vary year-to-year or even within a years. This is primarily due to two reasons - mis-entering co-author's names ("Rich" instead of "Richard", and "Simon Levin"/"Simon A. Levin"), and because people's names actually change. The latter primarily affects early-career women who are likely to change their names due to marriage.

This introduces some systematic bias into any measures of the network, so we have to ask (1) How can we correct these errors, and (2) what measures are robust to these errors? Network analysis isn't my forte, so I'd like some feedback on this.

Regarding (1):

I contacted ESA, and basically they told me that they don't know if their back-end database could help, but in any case they won't give access to it for privacy reasons, and don't have the capacity to do anything themselves.
I think that I can reduce some of fragmentation of names by automatically merging names. For instance, I can merge names that have the same first name and last name value, but where one case has a middle initial and another does not. There are probably a few other rules like that will have a net reduction in error. I have not had any success in finding libraries specific to this type of name processing.

Regarding (2):

This fragmentation will bias modularity upwards, I believe, but it should be consistent, so trends in modularity over time may be robust.
Pretty much any analysis that identifies individual names, e.g., "Who collaborates on the most abstracts?", "Who most frequently presents solo papers?", isn't robust to these issues.

Other thoughts?

tpoisot / esa2014twitter

Name errors in co-author network analysis #2