titipata / pubmed_parser

:clipboard: A Python Parser for PubMed Open-Access XML Subset and MEDLINE XML Dataset
http://titipata.github.io/pubmed_parser/
MIT License
559 stars 164 forks source link

physical & electronic publication dates can be mixed into erroneous dates #112

Open ghost opened 2 years ago

ghost commented 2 years ago

Describe the bug consider PMC 1280406: Published online: 2005 May 31 Published in journal: 2005 Sep

valid dates would be '2005-09' or '2005-05-31', but pp.parse_pubmed_xml yields '2005-09-31'

The culprit is here.

To Reproduce

import pubmed_parser as pp
pp.parse_pubmed_xml('pmc1280406.xml')['publication_date']
'31-9-2005'

pmc1280406.xml.zip

titipata commented 2 years ago

@aren-lorenson-enveda, thanks for the issue. Yes, it seems like here is an issue. I won't have much time to fix this issue this month. Hopefully, if someone sees this issue, they can make PR to fix this!