jakelever / biotext

Get a nicely-chunked local copy of the biomedical literature (to use for other projects)!
MIT License
13 stars 5 forks source link

Special Case: Pubmed Abstract Headers dropped in bioc output #19

Open creisle opened 1 year ago

creisle commented 1 year ago

For the following articles, the content of the section headers "Aim", "Conclusion", etc. is dropped in the final bioc output which means we lose some context

https://pubmed.ncbi.nlm.nih.gov/26161928/

Input XML Proposed Parse Current Parse
<AbstractText Label="AIM" NlmCategory="OBJECTIVE">To investigate the impact of KRAS mutation variants on the activity of regorafenib in SW48 colorectal cancer cells.</AbstractText> AIM: To investigate the impact of KRAS mutation variants on the activity of regorafenib in SW48 colorectal cancer cells. To investigate the impact of KRAS mutation variants on the activity of regorafenib in SW48 colorectal cancer cells.