Closed grabear closed 7 years ago
Hi @robear22890, thanks for the report and sorry for the late reply! So, I seems like the problem is from reading XML file (etree.from_string(path)
). Pubmed parser uses these snippet to read XML file. Can you check real quick if lxml
works to read example file for you or the file that you have a problem with?
From the error, it seems like you were using the wrong function to parse MEDLINE XML. For the MEDLINE one, you have to use parse_medline_xml
function instead of parse_pubmed_xml
. parse_pubmed_xml
is actually for Pubmed Open-Access subset XML files. Let me know if this solves the problem.
@robear22890 @titipata It seems that the problem is that it cannot find the file? As @titipata mentions, pubmed_parser
tries to read the given string as if it were a file path and if that fails it tries to read it as a XML string. So it first fails to read the file, and then it tries to read it as an XML. Can you please check that the file exists at that location?
I am closing this for now
Below I've copied my python instance. I'm trying to parse medline data. I've done this with your pubmed and medline parser on the listed machine as well as on a ubuntu server with the same error. I've also generated a file using the R programming language. If you are familiar with that, the package I used is called easyPubMed. I used the batch_pubmed_download() function.
Anyways I'd really like to use your code, especially as it links the authors with their affiliated institutions. I'm new to XML parsing so I have no idea what I'm doing in that respect.