titipata / pubmed_parser

:clipboard: A Python Parser for PubMed Open-Access XML Subset and MEDLINE XML Dataset
http://titipata.github.io/pubmed_parser/
MIT License
588 stars 168 forks source link

Add optional parsing of MeSH subheadings #117

Closed raypereda closed 2 years ago

raypereda commented 2 years ago

Currently, the parser returns the major MeSH headings. For example:

<MeshHeading>
    <DescriptorName MajorTopicYN="N" UI="D001993">Bronchodilator Agents</DescriptorName>
    <QualifierName MajorTopicYN="Y" UI="Q000008">administration &amp; dosage</QualifierName>
    <QualifierName MajorTopicYN="N" UI="Q000009">adverse effects</QualifierName>
</MeshHeading>

The goal is to return the two QualifierName elements, which represent subheadings.

titipata commented 2 years ago

Sounds great to me. Should I merge this issue for adding the data or you are working on the parser on the PR?

raypereda commented 2 years ago

I'm working on the PR and will send you a note when it's ready for review and merge. Glad you approve of the goal. I will make sure to not break any existing users of the library.

raypereda-gr commented 2 years ago

@nicholasjuncos showed me a cool website for generating test PubMed XML files given a list of PubMed IDs. https://pubmed2xl.com/xml/ That's how data/pubmed-29768149.xml was generated.

raypereda-gr commented 2 years ago

Will let you know next week. Using the new code may turn up a tweak.

DustinHolden commented 2 years ago

With little refactoring, I think we could eliminate the use of the global function swapping to make this more readable and maintainable. Happy to fork and do the refactoring.

titipata commented 2 years ago

@DustinHolden maybe you can coordinate with @raypereda and then I can merge once it's done?

DustinHolden commented 2 years ago

Can do, I'll coordinate with @raypereda . Thanks!

raypereda commented 2 years ago

This PR is done. @titipata Please merge.