Open noamross opened 10 years ago
Re. the abbreviation of names, perhaps we can take the first n (= 3, 4) letters of each first name, that should solve some of it.
My guess is that unless some really well connected people are systematically affected by this, the network metrics should be relatively robust to that. Primarily because the number of people involved is really large, so the nodes that contribute a lot to overall properties should be few.
Generating a network of co-authors from names in the ESA program is problematic because an individual's name may vary year-to-year or even within a years. This is primarily due to two reasons - mis-entering co-author's names ("Rich" instead of "Richard", and "Simon Levin"/"Simon A. Levin"), and because people's names actually change. The latter primarily affects early-career women who are likely to change their names due to marriage.
This introduces some systematic bias into any measures of the network, so we have to ask (1) How can we correct these errors, and (2) what measures are robust to these errors? Network analysis isn't my forte, so I'd like some feedback on this.
Regarding (1):
Regarding (2):
Other thoughts?