wpoa / recitation-bot

MediaWiki bot to upload content to Wikimedia projects and update corresponding citations on Wikipedia.
GNU General Public License v3.0
9 stars 3 forks source link

License statement missing #18

Open Daniel-Mietchen opened 10 years ago

Daniel-Mietchen commented 10 years ago

Instead of a license statement, some articles have only a "{{}}".

Examples: https://en.wikisource.org/wiki/Wikisource:WikiProject_Open_Access/Programmatic_import_from_PubMed_Central/Analysis_of_Human_Cytomegalovirus-Encoded_SUMO_Targets_and_Temporal_Regulation_of_SUMOylation_of_the_Immediate-Early_Proteins_IE1_and_IE2_during and https://en.wikisource.org/wiki/Wikisource:WikiProject_Open_Access/Programmatic_import_from_PubMed_Central/A_cladistically_based_reinterpretation_of_the_taxonomy_of_two_Afrotropical_tenebrionid_genera_Ectateus_Koch_1956_and_Selinus_Mulsant_%26_Rey_1853_%28

Daniel-Mietchen commented 10 years ago

Others do not even have the curly brackets, e.g. https://en.wikisource.org/w/index.php?title=Wikisource:WikiProject_Open_Access/Programmatic_import_from_PubMed_Central/Modelling_the_Species_Distribution_of_Flat-Headed_Cats_(Prionailurus_planiceps)_an_Endangered_South-East_Asian_Small_Felid&oldid=5032566 .

For an example that worked fine, see https://en.wikisource.org/w/index.php?title=Wikisource:WikiProject_Open_Access/Programmatic_import_from_PubMed_Central/Phylogeny_and_antiquity_of_M_macrohaplogroup_inferred_from_complete_mt_DNA_sequence_of_Indian_specific_lineages&oldid=4978686 .

wrought commented 10 years ago

This was partially an issue in the xslt which needed support for the 4.0 licenses.

Now this one works: https://en.wikisource.org/wiki/Wikisource:WikiProject_Open_Access/Programmatic_import_from_PubMed_Central/A_cladistically_based_reinterpretation_of_the_taxonomy_of_two_Afrotropical_tenebrionid_genera_Ectateus_Koch_1956_and_Selinus_Mulsant_%26_Rey_1853_(Coleoptera_Tenebrionidae_Platynotin

However, there may be a different issue here: https://tools.wmflabs.org/recitation-bot/10.1371/journal.pone.0009612.html the article license was supported before, so it should work already (but doesn't):

http://creativecommons.org/licenses/by/3.0/

This one is particularly conspicuous because, indeed, there are no curly braces at all.

Further, it is hard to tell what is going on with https://en.wikisource.org/wiki/Wikisource:WikiProject_Open_Access/Programmatic_import_from_PubMed_Central/Analysis_of_Human_Cytomegalovirus-Encoded_SUMO_Targets_and_Temporal_Regulation_of_SUMOylation_of_the_Immediate-Early_Proteins_IE1_and_IE2_during_Infection Since there is an opaque issue with the upload: https://tools.wmflabs.org/recitation-bot/doi:10.1371/journal.pone.0103308.html

doi: doi:10.1371/journal.pone.0103308

success: failed

'records'
Daniel-Mietchen commented 10 years ago

@notconfusing can you see why this upload failed?

notconfusing commented 10 years ago

@Daniel-Mietchen It's because the doi you entered was doi:10.1371/journal.pone.0103308 when you probably meant 10.1371/journal.pone.0103308.

notconfusing commented 10 years ago

OK!

Here's what I found out. J2MW determines the license by looking for a URI. pmc_extractor.py on the other hand with code taken from OAMI looks for a URI and and falls back onto text-matching as in https://github.com/wpoa/Open-License-Dictionary/

so really, there might not even be a point in J2MW doing this at all, since we could do it more accurately later on. @Klortho would you be comfortable removing this code and allowing python to post-process it later in the pipeline?

Klortho commented 10 years ago

I think you're talking about this template? It looks like it just writes a template into the content of the article. I don't have any problem removing it. Rather than remove it completely, I can put in an XSLT parameter to control it, that defaults to "false". Would that work?

notconfusing commented 10 years ago

@Klortho, Having it defualt to false is fine. Would it typically be controlled by a commandline argument to switch it to true?

Klortho commented 10 years ago

I commented out the license-generating code from the XSLT; and verified that it no longer generates license templates. I pushed my change to the master branch of that repo.