titipata / pubmed_parser

:clipboard: A Python Parser for PubMed Open-Access XML Subset and MEDLINE XML Dataset
http://titipata.github.io/pubmed_parser/
MIT License
564 stars 164 forks source link

parse_pubmed_xml does not return "subjects" #87

Closed thomascpan closed 4 years ago

thomascpan commented 4 years ago

Describe the bug Parse PubMed OA XML information appears to have a bug with the "subjects" attribute.

To Reproduce

dict_out = pp.parse_pubmed_xml(path)
dict_out["subjects"]

Will always be empty. https://github.com/titipata/pubmed_parser/blob/1376aa651f05662742e7e225c831c6ffda0dc91b/pubmed_parser/pubmed_oa_parser.py#L148 I believe the following change will fix the issue.

subjects_node = tree.findall(".//article-categories//subj-group/subject")

Expected behavior Should return subjects.