wpoa / open-access-media-importer

A tool for harvesting media files from Open Access articles for upload into Wikimedia Commons
http://commons.wikimedia.org/wiki/User:Open_Access_Media_Importer_Bot
23 stars 8 forks source link

Assertion error #16

Closed Daniel-Mietchen closed 11 years ago

Daniel-Mietchen commented 11 years ago

/usr/lib/python2.7/dist-packages/sqlalchemy/engine/default.py:463: SAWarning: Unicode type received non-unicode bind param value. param.append(processorskey) Traceback (most recent call last): File "./oa-get", line 55, in p = progressbar.ProgressBar(maxval=len(materials)) File "/usr/lib/pymodules/python2.7/progressbar.py", line 218, in init assert maxval > 0

AssertionError

Affected DOIs: 10.1371/journal.pbio.0020012 10.1186/1741-7007-10-72

erlehmann commented 11 years ago

This happens when there are no materials. Did the efetch download fail before?

Daniel-Mietchen commented 11 years ago

Just reran these two, pasting in full log below:

daniel@oami-host:~/open-access-media-importer$ echo 10.1371/journal.pbio.0020012 | ./oami_pmc_doi_import
Removing “/home/daniel/.local/share/open-access-media-importer/pmc_doi.sqlite” … done.
Input DOIs, delimited by whitespace: Getting PubMed Central IDs for given DOIs … found: 322746
Downloading “http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pmc&id=322746”, saving into directory “/home/daniel/.cache/open-access-media-importer/metadata/raw/pmc_doi” …
100% |##########################################################################################################################################################################################################|
toc
/usr/lib/python2.7/dist-packages/sqlalchemy/engine/default.py:463: SAWarning: Unicode type received non-unicode bind param value.
  param.append(processors[key](compiled_params[key]))
Traceback (most recent call last):
  File "./oa-get", line 55, in 
    p = progressbar.ProgressBar(maxval=len(materials))
  File "/usr/lib/pymodules/python2.7/progressbar.py", line 218, in __init__
    assert maxval > 0
AssertionError
daniel@oami-host:~/open-access-media-importer$ echo 10.1186/1741-7007-10-72 | ./oami_pmc_doi_import
Removing “/home/daniel/.local/share/open-access-media-importer/pmc_doi.sqlite” … done.
Input DOIs, delimited by whitespace: Getting PubMed Central IDs for given DOIs … found: 
Downloading “http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pmc&id=”, saving into directory “/home/daniel/.cache/open-access-media-importer/metadata/raw/pmc_doi” …
When trying to download , the following error occured: “HTTP Error 400: Bad Request”.                                                      |
Traceback (most recent call last):
  File "./oa-cache", line 206, in 
    skip=skip
  File "/home/daniel/open-access-media-importer/sources/pmc_doi.py", line 71, in list_articles
    result_tree.parse(path.join(target_directory, filename))
  File "", line 38, in parse
cElementTree.ParseError: no element found: line 1, column 0
Traceback (most recent call last):
  File "./oa-get", line 55, in 
    p = progressbar.ProgressBar(maxval=len(materials))
  File "/usr/lib/pymodules/python2.7/progressbar.py", line 218, in __init__
    assert maxval > 0
AssertionError
erlehmann commented 11 years ago

Probably fixed by 3ad58d7fe80afcbf4d8255e4902a124eff1a1bdd.

Daniel-Mietchen commented 11 years ago

Not fixed yet:

daniel@oami-host:~/open-access-media-importer$ echo 10.1371/journal.pbio.0020012 | ./oami_pmc_doi_import
Removing “/home/daniel/.local/share/open-access-media-importer/pmc_doi.sqlite” … done.
Input DOIs, delimited by whitespace: Getting PubMed Central IDs for given DOIs … found: 322746
Downloading “http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pmc&id=322746”, saving into directory “/home/daniel/.cache/open-access-media-importer/metadata/raw/pmc_doi” …
100% |#########################################################################|
toc
/usr/lib/python2.7/dist-packages/sqlalchemy/engine/default.py:463: SAWarning: Unicode type received non-unicode bind param value.
  param.append(processors[key](compiled_params[key]))
Checking MIME types …
Traceback (most recent call last):
  File "./oa-get", line 56, in 
    p = progressbar.ProgressBar(maxval=len(materials))
  File "/usr/lib/pymodules/python2.7/progressbar.py", line 218, in __init__
    assert maxval > 0
AssertionError
daniel@oami-host:~/open-access-media-importer$ echo 10.1186/1741-7007-10-72 | ./oami_pmc_doi_import
Removing “/home/daniel/.local/share/open-access-media-importer/pmc_doi.sqlite” … done.
Input DOIs, delimited by whitespace: Getting PubMed Central IDs for given DOIs … Traceback (most recent call last):
  File "./oa-get", line 113, in 
    for result in source_module.download_metadata(source_path):
  File "/home/daniel/open-access-media-importer/sources/pmc_doi.py", line 56, in download_metadata
    raise RuntimeError, 'No PubMed Central IDs for given DOIs found.'
RuntimeError: No PubMed Central IDs for given DOIs found.
erlehmann commented 11 years ago

In the first example, the videos are not in <supplementary-material>, but <fig> elements. I am looking into the issue.

erlehmann commented 11 years ago

Issue with DOI 10.1371/journal.pbio.0020012 fixed by dc1d8026158ee28edabe5015a3703ecfff1cdbe6, looking into issue with DOI 10.1186/1741-7007-10-72.

erlehmann commented 11 years ago

For DOI 10.1186/1741-7007-10-72, efetch returns no PMCID.

Daniel-Mietchen commented 11 years ago

Some more cases: 10.3389/fncir.2012.00064 10.3389/fnint.2012.00054 10.3389/fpls.2012.00151 10.3389/fphys.2012.00293

echo 10.3389/fpls.2012.00151 | ./oami_pmc_doi_import Removing “/home/daniel/.local/share/open-access-media-importer/pmc_doi.sqlite” … done. Input DOIs, delimited by whitespace: Getting PubMed Central IDs for given DOIs … found: 3395867 Downloading “http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pmc&id=3395867”, saving into directory “/home/daniel/.cache/open-access-media-importer/metadata/raw/pmc_doi” … 100% |##########################################################################################################################################################################################################| /usr/lib/python2.7/dist-packages/sqlalchemy/engine/default.py:463: SAWarning: Unicode type received non-unicode bind param value. param.append(processorskey) Checking MIME types … Traceback (most recent call last): File "./oa-get", line 56, in p = progressbar.ProgressBar(maxval=len(materials)) File "/usr/lib/pymodules/python2.7/progressbar.py", line 218, in init assert maxval > 0 AssertionError

Daniel-Mietchen commented 11 years ago

Some more cases: 10.3389/fncir.2012.00064 10.3389/fnint.2012.00054 10.3389/fphys.2012.00293 10.3389/fpls.2012.00151 10.3389/fphys.2012.00414 10.3389/fmicb.2012.00338

Sample output:

Removing “/home/daniel/.local/share/open-access-media-importer/pmc_doi.sqlite” … done. Input DOIs, delimited by whitespace: Getting PubMed Central IDs for given DOIs … found: 3410411 Downloading “http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pmc&id=3410411”, saving into directory “/home/daniel/.cache/open-access-media-importer/metadata/raw/pmc_doi” … 100% |##########################################################################################################################################################################################################| /usr/lib/python2.7/dist-packages/sqlalchemy/engine/default.py:463: SAWarning: Unicode type received non-unicode bind param value. param.append(processorskey) Checking MIME types … Traceback (most recent call last): File "./oa-get", line 56, in p = progressbar.ProgressBar(maxval=len(materials)) File "/usr/lib/pymodules/python2.7/progressbar.py", line 218, in init assert maxval > 0 AssertionError