Closed ZhangWoW123 closed 2 weeks ago
I expect the parser to return all author's affiliations as a list. Might consider changing the
author.find("AffiliationInfo/Affiliation").text
withlist(chain(*([c.text] for c in author.findall("AffiliationInfo/Affiliation"))))
?
This looks like the solution to me. Do you want to provide a PR? Ideally with an additional test case in https://github.com/titipata/pubmed_parser/blob/master/tests/test_medline_parser.py#L33.
@Michael-E-Rose,
Sure, I created the PR https://github.com/titipata/pubmed_parser/pull/162
Thank you for your service!
Hi team,
Describe the bug I encountered another issue when using the package to extract PubMed affiliation information from XML files. When author has multiple affiliations, the
parse_medline_xml
function will only extract the first affiliation.To Reproduce An example of this issue is PMID section is structured as follows. Each author has multiple affinations
39029952
. In the XML file, theThe
medline_parser.parse_author_affiliation
useauthor.find("AffiliationInfo/Affiliation")
to find the affilation infromation. However, the find will only return one object (i.e. first element). Thus, the first affiliation is returned.Expected behavior I expect the parser to return all author's affiliations as a list. Might consider changing the
author.find("AffiliationInfo/Affiliation").text
withlist(chain(*([c.text] for c in author.findall("AffiliationInfo/Affiliation"))))
?Screenshots
XML file example pmid_39029952.txt
Thank you all for the great support.