Closed huminpurin closed 4 years ago
Hi @huminpurin, can you point to tue sample of XML file that you're obtaining from? I will have more time next week to fix the library.
Thanks @titipata I got the file from the official database of national library of medicine: ftp://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/
@huminpurin, I see now. So, we currently do not have the implementation for parsing MEDLINE references yet. I am checking out now if I can get the references from MEDLINE. However, implementing such function (parse_medline_xml
) would be great to have for the library!
@huminpurin, I actually do not see the reference data from MEDLINE dataset. If you can point to the specific file name that has references data for me, I can try to implement it for you.
@titipata Yes, it would be great if there is a (parse_medline_xml) function, afterall all files on nlm database are medline xml.
To answer your question, there is a tag as <ReferenceList>
in the xml files which lists the references of a paper. Heres some example lines from actual xml file:
Proc Natl Acad Sci U S A. 2012 Apr 10;109(15):5850-5 22454498 ...... ACS Synth Biol. 2017 Jul 21;6(7):1296-1304 28274123
@huminpurin, ah nice, thanks a lot! I did not notice it exists before. It seems like the references are not available for all of the XML. I sample a few publications but still didn't see the ReferenceList
. Can you point me specifically which file name did you get this example from?
I will take a look and update with you soon!
@titipata Oh I see where the problem is. Not all the files contain reference list in following database
ftp://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/
Only some of the files contain reference list. I just checked the last one pubmed19n1117.xml.gz
and there is a reference list. Maybe the old data don't have reference list.
@huminpurin, sorry for getting back to this late. I think I got it now. I will update with you in the new PR.
@huminpurin, sorry for getting back to this late. I think I got it now. I will update with you in the new PR.
Fixed in #69.
I get
"AttributeError: 'NoneType' object has no attribute 'find'"
for usingparse_pubmed_references
on xml files of MEDLINE/PubMed Data (https://www.nlm.nih.gov/databases/download/pubmed_medline.html)parse_medline_xml
can parse xmls but not getting refference. I checked the xml files and im sure the reference data is in there. Is there any way to get something like "parse_medline_references"?