titipata / pubmed_parser

:clipboard: A Python Parser for PubMed Open-Access XML Subset and MEDLINE XML Dataset
http://titipata.github.io/pubmed_parser/
MIT License
559 stars 164 forks source link

Added issue, page number, and firstname to medline_parse_xml() #100

Closed raypereda-gr closed 3 years ago

raypereda-gr commented 3 years ago
  1. forename and initials are tricky. Before the parser returned firstname with values from initials tag text. In this change, the parse function returns both forename and initials. Because this changes the dictionary keys returned by the parser, I bumped the version number. In the XML, initials is the initials of the forename, not the whole name. Before: { "firstname": "JP", "lastname": "Smith" ... } After { "forename": "John Paul", "initials": "JP", "lastname": "Smith" ... }

  2. Added issue that combines volume and issue info. For example "issue": "50(2)".

  3. Added pages returned and pulled from the Pagination/MedlinePgn tag text.

  4. I added a test for the issue and page number addition.

Thanks for open sourcing this @titipata

titipata commented 3 years ago

This is super cool! I haven't noticed that we can access both forename and initials. Going to merge this PR. Thanks @raypereda-gr!