titipata / pubmed_parser

:clipboard: A Python Parser for PubMed Open-Access XML Subset and MEDLINE XML Dataset
http://titipata.github.io/pubmed_parser/
MIT License
579 stars 166 forks source link

add Language and VernacularTitle element extraction #106

Closed jtourille closed 2 years ago

jtourille commented 2 years ago

Language information and associated vernacular titles are useful information for people working in NLP and looking to extract documents in other languages than English. I added this feature to the code and wrote associated description and tests.

Here is the fine-grained description of the changes: