titipata / pubmed_parser

:clipboard: A Python Parser for PubMed Open-Access XML Subset and MEDLINE XML Dataset
http://titipata.github.io/pubmed_parser/
MIT License
580 stars 166 forks source link

Error parsing date #154

Closed nils-herrmann closed 1 month ago

nils-herrmann commented 1 month ago

There is a minor bug in the PR #141 when parsing the year. There are documents without ppub and collection which means that pub_date_dict["year"] is empty.

Example with PMC6218202

parse_pubmed_xml(path_to_ PMC6218202, nxml=True)

Trows:

[197] try: --> [198] pub_year = int(pub_date_dict["year"]) [199] except TypeError: [200] pub_year = None KeyError: 'year'

Let's just use .get()