titipata / affiliation_parser

Simple python parser for MEDLINE, Pubmed OA affiliation string
37 stars 15 forks source link

email parse suggestion #20

Open Lix1993 opened 3 years ago

Lix1993 commented 3 years ago

from NCBI's doc, email is always the last element in affiliation, split by ' ' and check whether the last one is a email could get a more accurate result

Affiliation (AD)
The affiliation of the authors, corporate authors and investigators appear in this repeating field.  Until 2014, only the affiliation of the first author was included.  The data included in this field and control of the data has changed over time, as follows:

1988- The address of the first author's affiliation is included. The institution, city, and state including zip code for U.S. addresses, and country for countries outside of the United States, are included if provided in the journal; sometimes the street address is also included if provided in the journal.
1995-2013 The designation USA is added at the end of the address when the first author's affiliation is in the fifty United States or the District of Columbia.
1996- The primary author's electronic mail (e-mail) address is included at the end of the Affiliation field, if present in the journal.
2003- The complete first author address is entered as it appears in the article with no words omitted.
October 2013- Quality control of this field ceased in order to accommodate the affiliations for all authors and contributors.
December 2014- Multiple affiliations for each author or contributor are included.
The affiliation data is provided as supplied by the publisher. Publishers are requested to include the following data if available, separated by commas: division of the institution, institution name, city, state, postal or zip code, country (USA for the United States) followed by a period, then a space followed by the e-mail address.
Lix1993 commented 3 years ago

https://www.nlm.nih.gov/bsd/mms/medlineelements.html#cn