Closed soupstandstop closed 4 years ago
I supposed that it will return the section text of this PMC?
@soupstandstop, in the particular file, there is no text content in the given PMC. You can check the file inside data
folder.
Yes, but in the case of the file that there have text in PMC, I still have the empty list.
ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_package/00/00/ The nxml file in PMC5640403.tar.gz You can check this file, the return is still empty?
@soupstandstop, sorry for the late reply and thanks! I can check it later this week. If you find the way to resolve the issue, please feel free to make the Pull request tho!
I make the Pull request, if have any problem please tell me, thanks!
Observed the same behavior for all NXML (even from the data samples) for the function pp.parse_pubmed_paragraph. Debugging, as we speak, to know why this is happening and will keep you updated.
Thanks so much @MananVyas24. Let me know if you figure out where the error comes from. I do not have much time to debug but will check out the PR as soon as possible!
@soupstandstop I checked and b2ccfe7 and a769744 fix this issue. Let me know if you have any problem parsing the paragraph text using a recent version of pubmed_parser
.
Below, I attach a snippet to parse nxml
from ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_package/00/00/PMC5640403.tar.gz
import pubmed_parser as pp
import pandas as pd
paragraphs = pp.parse_pubmed_paragraph('ott-10-4895.nxml')
>> [
{'pmc': '5640403',
'pmid': '29070952',
'reference_ids': [
'b1-ott-10-4895',
'b2-ott-10-4895',
'b3-ott-10-4895',
'b4-ott-10-4895',
'b5-ott-10-4895',
'b6-ott-10-4895'],
'section': 'Introduction',
'text': 'With an incidence rate ...
}, ...
]
Hi, Why did I enter:
pp.parse_pubmed_paragraph('data/6605965a.nxml', all_paragraph=False)
the return is empty list?