titipata / pubmed_parser

:clipboard: A Python Parser for PubMed Open-Access XML Subset and MEDLINE XML Dataset
http://titipata.github.io/pubmed_parser/
MIT License
580 stars 166 forks source link

[pubmed oa] Parse conflict of interest statements #90

Closed simonwoerpel closed 4 years ago

simonwoerpel commented 4 years ago

Hey, for a project i am involved i need to parse the statements about possible conflict of interests the authors / journals publish. Here is a work-in-progress approach that fulfills my basic needs for now, but I am happy for a discussion on how to improve this.

codecov-commenter commented 4 years ago

Codecov Report

Merging #90 into master will not change coverage. The diff coverage is 0.00%.

Impacted file tree graph

@@          Coverage Diff           @@
##           master     #90   +/-   ##
======================================
  Coverage    0.00%   0.00%           
======================================
  Files           5       5           
  Lines         729     740   +11     
======================================
- Misses        729     740   +11     
Impacted Files Coverage Δ
pubmed_parser/pubmed_oa_parser.py 0.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update d958d74...963a025. Read the comment docs.

titipata commented 4 years ago

@simonwoerpel, this looks good to me! Maybe you can also add an example to tests and then I can merge this?

simonwoerpel commented 4 years ago

Sure I can, but I was not sure if this is already "enough" or if we should look a bit closer into the structure of how these statements are defined in the xml. And I didn't even investigate if there are more xpaths that contain possible statements. Thus, my solution here feels a bit hacky in just yielding everything into one string... that's why I opened this as a WIP for further discussion :see_no_evil:

simonwoerpel commented 4 years ago

I will continue hacking on this and maybe add a few more commits.

titipata commented 4 years ago

@simonwoerpel that's no worry! Yeah, if you can do a bit of testing, that would be awesome!

titipata commented 4 years ago

@simonwoerpel, should I merge this PR first or should I wait for a bit? Thanks!!

simonwoerpel commented 4 years ago

hey, i will do some improvements about the coi parsing this week and then go for merge! thanks for the patience. :slightly_smiling_face:

titipata commented 4 years ago

Thanks for the follow-up! I can check it right after you're done. Thank you!

simonwoerpel commented 4 years ago

ok i had a more closer look at it now, found some more xpaths that can contain possible conflict of interest statements, and added a line to the test in tests/test_pubmed_oa_parser.py

looks good for me now! :slightly_smiling_face:

titipata commented 4 years ago

Woohoo! This looks good!