titipata / pubmed_parser

:clipboard: A Python Parser for PubMed Open-Access XML Subset and MEDLINE XML Dataset
http://titipata.github.io/pubmed_parser/
MIT License
580 stars 166 forks source link

Error while reading Medline gz file from path #44

Closed deakkon closed 4 years ago

deakkon commented 7 years ago
In [234]: pp.parse_pubmed_xml('/home/docClass/files/pubmed/medline17n0330.xml.gz')
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-234-bfaa6482d6d1> in <module>()
----> 1 pp.parse_pubmed_xml('/home/docClass/files/pubmed/medline17n0330.xml.gz')

/root/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pubmed_parser-0.1.dev0-py2.7.egg/pubmed_parser/pubmed_oa_parser.pyc in parse_pubmed_xml(path, include_path)
    108         journal = ''
    109
--> 110     dict_article_meta = parse_article_meta(tree)
    111     pub_year_node = tree.find('//pub-date/year')
    112     pub_year = pub_year_node.text if pub_year_node is not None else ''

/root/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pubmed_parser-0.1.dev0-py2.7.egg/pubmed_parser/pubmed_oa_parser.pyc in parse_article_meta(tree)
     56     """
     57     article_meta = tree.find('//article-meta')
---> 58     pmid_node = article_meta.find('article-id[@pub-id-type="pmid"]')
     59     pmc_node = article_meta.find('article-id[@pub-id-type="pmc"]')
     60     pub_id_node = article_meta.find('article-id[@pub-id-type="publisher-id"]')

AttributeError: 'NoneType' object has no attribute 'find'
titipata commented 7 years ago

thanks @deakkon, I'll take a look at the particular file and fix it asap!

daniel-acuna commented 7 years ago

@titipata Can pubmed_parser parse gzip files? I don't think so

titipata commented 4 years ago

Parsing Medline gz file should be done using parse_medline_xml instead. I'll close this issue for now.