Closed deakkon closed 6 years ago
I don't think parse_medline_xml
parses .gz
files. You need to uncompress it first.
Hi,
are you sure? E.g. pp.parse_medline_xml('pubmed18n0364.xml.gz') (source ftp://ftp.ncbi.nlm.nih.gov/pubmed/baseline/pubmed18n0364.xml.gz)
gives back a list of dicts.
Can you try uncompress it first? The file works for me
Sorry, my mistake! The issues was that the file was not properly downloaded (Im performing a batch download and no error was printed out).
Redownloaded it manually and it works directly from the path (skipping uncompressing).
Best, J.
In [41]: pp.parse_medline_xml('/home/docClass/files/pubmed/pubmed18n1040.xml.gz') Error: it was not able to read a path, a file-like object, or a string as an XML File "", line 1
XMLSyntaxError: Start tag expected, '<' not found, line 1, column 1
Source: ftp://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/pubmed18n1040.xml.gz