wpoa / open-access-media-importer

A tool for harvesting media files from Open Access articles for upload into Wikimedia Commons
http://commons.wikimedia.org/wiki/User:Open_Access_Media_Importer_Bot
23 stars 8 forks source link

Don't exit if PMID is missing #61

Closed Daniel-Mietchen closed 11 years ago

Daniel-Mietchen commented 11 years ago

For newly published articles, the PMID may sometimes be missing when PMCID and DOI are already available. This currently causes the script to exit before upload, but I would prefer the upload to take place without PMID, which can always be added later.

Sample log:

Input PMCIDs, delimited by whitespace: Removing “/home/daniel/.cache/open-access-media-importer/metadata/raw/pmc_pmcid/efetch.fcgi0” … done. Downloading “http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pmc&id=3511399”, saving into directory “/home/daniel/.cache/open-access-media-importer/metadata/raw/pmc_pmcid” … 100% |#########################################################################################################################################################################################################| Skipping 12 records … PLOS ONE 2012 Dynamic Volume Changes in Astrocytes Are an Intrinsic Phenomenon Mediated by Bicarbonate Ion Flux /usr/lib/python2.7/dist-packages/sqlalchemy/engine/default.py:463: SAWarning: Unicode type received non-unicode bind param value. param.append(processorskey) “Dynamic Volume Changes in Astrocytes Are an Intrinsic Phenomenon Mediated by Bicarbonate Ion Flux”: 2 × video/quicktime

Checking MIME types … 100% |#########################################################################################################################################################################################################| Unknown, possibly non-free license: http://www.frontiersin.org/licenseagreement Unknown, possibly non-free license: http://www.frontiersin.org/licenseagreement Unknown, possibly non-free license: http://www.frontiersin.org/licenseagreement Unknown, possibly non-free license: http://creativecommons.org/licenses/by-nc-sa/3.0/ Unknown, possibly non-free license: http://creativecommons.org/licenses/by-nc-sa/3.0/ Unknown, possibly non-free license: http://creativecommons.org/licenses/by-nc-sa/3.0/ Unknown, possibly non-free license: http://creativecommons.org/licenses/by-nc-sa/3.0/ Unknown, possibly non-free license: http://creativecommons.org/licenses/by-nc-sa/3.0/ Unknown, possibly non-free license: http://creativecommons.org/licenses/by-nc-sa/3.0/ Unknown, possibly non-free license: http://creativecommons.org/licenses/by-nc-sa/3.0/ Unknown, possibly non-free license: http://creativecommons.org/licenses/by-nc-sa/3.0/ Unknown, possibly non-free license: Downloading http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3511399/bin/pone.0051124.s001.mov, saving into directory “/home/daniel/.cache/open-access-media-importer/media/raw/pmc_pmcid” … 100% |#########################################################################################################################################################################################################| Downloading http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3511399/bin/pone.0051124.s002.mov, saving into directory “/home/daniel/.cache/open-access-media-importer/media/raw/pmc_pmcid” … 100% |#########################################################################################################################################################################################################| Converting “/home/daniel/.cache/open-access-media-importer/media/raw/pmc_pmcid/pone.0051124.s001.mov”, saving into “/home/daniel/.cache/open-access-media-importer/media/refined/pmc_pmcid/pone.0051124.s001.mov.ogv” … 35% |####################################################################### done.|########################################################################################################################################################################################## | Converting “/home/daniel/.cache/open-access-media-importer/media/raw/pmc_pmcid/pone.0051124.s002.mov”, saving into “/home/daniel/.cache/open-access-media-importer/media/refined/pmc_pmcid/pone.0051124.s002.mov.ogv” … 50% |#################################################################################################### done.|############################################################################################################################################################################### | Traceback (most recent call last): File "./oa-put", line 70, in article_pmid = efetch.get_pmid_from_doi(article_doi) File "/home/daniel/open-access-media-importer/helpers/efetch.py", line 33, in get_pmid_from_doi return tree.find('IdList/Id').text AttributeError: 'NoneType' object has no attribute 'text' Input PMCIDs, delimited by whitespace: Removing “/home/daniel/.cache/open-access-media-importer/metadata/raw/pmc_pmcid/efetch.fcgi0” … done.

erlehmann commented 11 years ago

Fixed by 5e76cfac762bcc599a553c3e259a95e3601754e2.

Daniel-Mietchen commented 11 years ago

Still getting errors on this one: Downloading “http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pmc&id=3511835”, saving into directory “/home/daniel/.cache/open-access-media-importer/metadata/raw/pmc_pmcid” … 100% |#############################################################################################################################################################################################| Skipping 3 records … Evidence-based Complementary and Alternative Medicine 2012 Bioassay-Guided Isolation and HPLC Determination of Bioactive Compound That Relate to the Antiplatelet Activity (Adhesion, Secretion, and Aggregation) from Solanum lycopersicum /usr/lib/python2.7/dist-packages/sqlalchemy/engine/default.py:463: SAWarning: Unicode type received non-unicode bind param value. param.append(processorskey) “Bioassay-Guided Isolation and HPLC Determination of Bioactive Compound That Relate to the Antiplatelet Activity (Adhesion, Secretion, and Aggregation) from Solanum lycopersicum”: 3 × video/x-ms-wmv

Checking MIME types … When trying to download http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3517510/bin/1471-2148-12-81-S1.doc, the following error occured: “HTTP Error 416: Wrong Request”. When trying to download http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3517510/bin/1471-2148-12-81-S2.pdf, the following error occured: “HTTP Error 416: Wrong Request”. When trying to download http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3517510/bin/1471-2148-12-81-S3.pdf, the following error occured: “HTTP Error 416: Wrong Request”. When trying to download http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3517358/bin/1472-6920-12-70-S1.pdf, the following error occured: “HTTP Error 416: Wrong Request”. When trying to download http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3517358/bin/1472-6920-12-70-S2.pdf, the following error occured: “HTTP Error 416: Wrong Request”. When trying to download http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3517534/bin/1745-6215-13-191-S1.pdf, the following error occured: “HTTP Error 416: Wrong Request”. When trying to download http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3511835/bin/147031.f1.wmv, the following error occured: “HTTP Error 416: Wrong Request”. When trying to download http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3511835/bin/147031.f2.wmv, the following error occured: “HTTP Error 416: Wrong Request”. When trying to download http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3511835/bin/147031.f3.wmv, the following error occured: “HTTP Error 416: Wrong Request”. Downloading http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3511835/bin/147031.f1.wmv, saving into directory “/home/daniel/.cache/open-access-media-importer/media/raw/pmc_pmcid” … 100% |#############################################################################################################################################################################################| Downloading http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3511835/bin/147031.f2.wmv, saving into directory “/home/daniel/.cache/open-access-media-importer/media/raw/pmc_pmcid” … 100% |#############################################################################################################################################################################################| Downloading http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3511835/bin/147031.f3.wmv, saving into directory “/home/daniel/.cache/open-access-media-importer/media/raw/pmc_pmcid” … 100% |#############################################################################################################################################################################################| Converting “/home/daniel/.cache/open-access-media-importer/media/raw/pmc_pmcid/147031.f1.wmv”, saving into “/home/daniel/.cache/open-access-media-importer/media/refined/pmc_pmcid/147031.f1.wmv.ogv” … 2% |#### done.|############################################################################################################################################################################################ | Converting “/home/daniel/.cache/open-access-media-importer/media/raw/pmc_pmcid/147031.f2.wmv”, saving into “/home/daniel/.cache/open-access-media-importer/media/refined/pmc_pmcid/147031.f2.wmv.ogv” … 3% |##### done.|######################################################################################################################################################################################### | Converting “/home/daniel/.cache/open-access-media-importer/media/raw/pmc_pmcid/147031.f3.wmv”, saving into “/home/daniel/.cache/open-access-media-importer/media/refined/pmc_pmcid/147031.f3.wmv.ogv” … 0% | done.|############################################################################################################################################################################################ | Traceback (most recent call last): File "./oa-put", line 86, in categories += efetch.get_categories_from_pmid(article_pmid) File "/home/daniel/open-access-media-importer/helpers/efetch.py", line 47, in get_categories_from_pmid raise TypeError, "Cannot get Categories for PMID %s of type %s." % (pmid, type(pmid)) TypeError: Cannot get Categories for PMID 2012 of type <type 'str'>

Daniel-Mietchen commented 11 years ago

Reopening due to "TypeError: Cannot get Categories for PMID 2012 of type" (see end of log in previous comment).

erlehmann commented 11 years ago

Fixed by 5b4084c0c9e30e8a662b0e456efe77f7c1d4178a.