titipata / affiliation_parser

Simple python parser for MEDLINE, Pubmed OA affiliation string
37 stars 15 forks source link

affiliation parser return wrong value while ',' in institution name #15

Open Lix1993 opened 3 years ago

Lix1993 commented 3 years ago
from affiliation_parser import parse_affil
parse_affil("School of Humanities and Social Science, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen, P. R. China.")

{'full_text': 'School of Humanities and Social Science, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen, P. R. China.',
 'department': '',
 'institution': 'The Chinese University of Hong Kong',
 'email': '',
 'zipcode': '',
 'location': 'Shenzhen, Longgang District, Shenzhen, P R China',
 'country': 'china'}
Lix1993 commented 3 years ago

It should be Chinese University of Hong Kong, Shenzhen, but affiliation_parser return Chinese University of Hong Kong

Lix1993 commented 3 years ago

Chinese University of Hong Kong grid.10784.3a

Chinese University of Hong Kong, Shenzhen grid.511521.3

titipata commented 3 years ago

@Lix1993 thanks! I noticed it. I probably wrote the rules wrong in here. Not sure if I can fix the issue but may have a look later on.

Lix1993 commented 2 years ago

update UNIVERSITY_MULTIPLE_CAMPUS will fix this