wpoa / open-access-media-importer

A tool for harvesting media files from Open Access articles for upload into Wikimedia Commons
http://commons.wikimedia.org/wiki/User:Open_Access_Media_Importer_Bot
23 stars 8 forks source link

Account for new PLOS license statement #121

Closed Daniel-Mietchen closed 10 years ago

Daniel-Mietchen commented 10 years ago

PLOS have recently changed from

<license>
  <license-p>This is an open-access article distributed under the terms of the 
    <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
</license-p>

(example: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pmc&id=PMC3919755 ) to

<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
  <license-p>This is an open-access article distributed under the terms of the
    <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">
      Creative Commons Attribution License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
</license-p>

(example: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pmc&id=PMC3913557 ). I think https://github.com/erlehmann/open-access-media-importer/blob/master/sources/pmc.py#L490 to https://github.com/erlehmann/open-access-media-importer/blob/master/sources/pmc.py#L500 has to be adapted accordingly.

Currently, these new license statements are interpreted as None.

Klortho commented 10 years ago

I am wondering if the cause of this problem is the "4.0", and that CC just came out with this recently. For forwards-compatibility, you might consider making the software match not care about the version number. I.e., any license of the form http://creativecommons.org/licenses/by/\d\.\d/ is considered CC-BY.

Daniel-Mietchen commented 10 years ago

The problem is not the 4.0 license (which has been used by PLOS since mid-December) but the inclusion of the xlink statement into the license tag (which was introduced earlier this month). Not sure that is proper JATS (may well be), but in any case, it leads to all PLOS stuff now being labeled as licensed "None".

Klortho commented 10 years ago

I checked again, and in fact, this is contrary to the PMC tagging guidelines. Here's the new PLOS tagging, again, indented:

<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
  <license-p>This is an open-access article distributed under the terms of the 
    <ext-link ext-link-type="uri" 
    xlink:href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 
    License</ext-link>, which permits unrestricted use, distribution, and reproduction in 
    any medium, provided the original author and source are credited.
  </license-p>
</license>

And here is what the PMC tagging guidelines have to say (here):

When including a URI for a license, the URI should be included either on the in @xlink:href or in an in the content of the license. If the URI appears in the license text, tag the URI as an in the content. If the URI does not appear in the license text, tag it as @xlink:href on . The URI must not be tagged in both places.

Daniel-Mietchen commented 10 years ago

Any idea why the validation at PMC does not pick that up?

Klortho commented 10 years ago

Hi, I just checked. Apparently (I didn't know this) these are "just guidelines", and the stylechecker doesn't have checks for everything that is called out here.

Probably the best thing to do is to fix the OAMI to handle this. It should always prefer the URI in the license/@xlink:href attribute, if it is present.

Daniel-Mietchen commented 10 years ago

Some more examples for testing: 10.1371/journal.pone.0088014 10.1371/journal.pone.0088612 10.1371/journal.pone.0089000 10.1371/journal.pone.0087663 10.1371/journal.pone.0087662 10.1371/journal.pone.0087661 10.1371/journal.pone.0087649 10.1371/journal.pone.0087644

erlehmann commented 10 years ago

Issue fixed with 32fcabcb4740d1e34f793d9f2acfc436f50dc489. Test for issue is at “tests/118-plos-license-statement.do”.

Daniel-Mietchen commented 10 years ago

Similar issue is in https://github.com/erlehmann/open-access-media-importer/issues/124 .