wpoa / open-access-media-importer

A tool for harvesting media files from Open Access articles for upload into Wikimedia Commons
http://commons.wikimedia.org/wiki/User:Open_Access_Media_Importer_Bot
23 stars 8 forks source link

CC0 not recognized #124

Open Daniel-Mietchen opened 10 years ago

Daniel-Mietchen commented 10 years ago

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pmc&id=PMC3907294 has the text

<license>
    <license-p>
        This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.
    </license-p>
</license>

This is precisely the same text as in https://github.com/erlehmann/open-access-media-importer/blob/master/sources/pmc.py#L235 , which maps to http://creativecommons.org/publicdomain/zero/1.0/ and should be recognized as open.

danielmietchen@files:~/open-access-media-importer$ git pull
Already up-to-date.
danielmietchen@files:~/open-access-media-importer$ ./oa-cache clear-database pmc_doi
Removing “/home/danielmietchen/.local/share/open-access-media-importer/pmc_doi.sqlite” … done.
danielmietchen@files:~/open-access-media-importer$ echo 10.1371/journal.pcbi.1003447 | ./oami_pmc_doi_import
Input DOIs, delimited by whitespace: Getting PubMed Central IDs for given DOIs … found: 3907294
Downloading “http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pmc&id=3907294”, saving into directory “/home/danielmietchen/.cache/open-access-media-importer/metadata/raw/pmc_doi” …
100% |#########################################################################################################################################################################################################|
/usr/lib/python2.7/dist-packages/sqlalchemy/engine/default.py:463: SAWarning: Unicode type received non-unicode bind param value.
  param.append(processors[key](compiled_params[key]))
“A Division in PIN-Mediated Auxin Patterning during Organ Initiation in Grasses”:
        20 × /

Checking MIME types …
No materials found where MIME type has to be checked.
Unknown, possibly non-free license: <None>
Unknown, possibly non-free license: <None>
Unknown, possibly non-free license: <None>
Unknown, possibly non-free license: <None>
Unknown, possibly non-free license: <None>
Unknown, possibly non-free license: <None>
Unknown, possibly non-free license: <None>
Unknown, possibly non-free license: <None>
Unknown, possibly non-free license: <None>
Unknown, possibly non-free license: <None>
Unknown, possibly non-free license: <None>
Unknown, possibly non-free license: <None>
Unknown, possibly non-free license: <None>
Unknown, possibly non-free license: <None>
Unknown, possibly non-free license: <None>
Unknown, possibly non-free license: <None>
Unknown, possibly non-free license: <None>
Unknown, possibly non-free license: <None>
Unknown, possibly non-free license: <None>
Unknown, possibly non-free license: <None>
Daniel-Mietchen commented 10 years ago

Another example: 10.1371/journal.pone.0086680

Daniel-Mietchen commented 9 years ago

@erlehmann can you please take a look at this? I just tried a fix via https://github.com/wpoa/open-access-media-importer/commit/f42313c7c612bf3aead1813a89cac06eece19d5d but it does not seem to have been enough.

Daniel-Mietchen commented 9 years ago

Does not work.

erlehmann commented 9 years ago

Please wait with this until the wmde-review branch is merged. It contains a refactoring of some of logic regarding licensing heuristics. Daniel Kinzler said he would put 2 hours into that this week. Meanwhile, I can add this as a test case to the wmde-review branch and see if it works.

erlehmann commented 9 years ago

Works on the wmde-review branch, see commit aee79b30678fb49a947404b20871d3b21d54c240.

Daniel-Mietchen commented 9 years ago

OK, thanks.