titipata / pubmed_parser

:clipboard: A Python Parser for PubMed Open-Access XML Subset and MEDLINE XML Dataset
http://titipata.github.io/pubmed_parser/
MIT License
559 stars 164 forks source link

Question: parsing error first line expecting '<' not found #124

Closed jjgreen12 closed 1 year ago

jjgreen12 commented 1 year ago

Hi, I've installed pubmed_parser and am trying to use the parse_pubmed_xml function for baseline files.

I have unzipped the tar.gz baseline file and have individual xml files I am iterating through.

However, I get the following error: lxml.etree.XMLSyntaxError: Start tag expected, '<' not found, line 1, column 1

When I open the file, there is indeed a '<' in the first line/column. I am assuming this is an obvious oversight on my part with something simple, could you please help me get back on track?

Thank you for your time and consideration.