wpoa / recitation-bot

MediaWiki bot to upload content to Wikimedia projects and update corresponding citations on Wikipedia.
GNU General Public License v3.0
9 stars 3 forks source link

make force_reupload checkboxes for text/ images/ both #37

Closed Daniel-Mietchen closed 9 years ago

Daniel-Mietchen commented 9 years ago

default should be both

Daniel-Mietchen commented 9 years ago

This would help to avoid cases like https://commons.wikimedia.org/wiki/User_talk:VIAFbot#.7B.7BAutotranslate.7C1.3DFile:New-Family-of-Bluish-Pyranoanthocyanins-40403.fig.001.jpg.7Cbase.3DImage_license.2Fheading.7D.7D .

notconfusing commented 9 years ago

I think that "reupload images" might implicitly imply "reupload text" because if the image file names change during the image reuploads then we have to modify the wikitext of the article.

So more precisely I think there should be a "reupload text only" and "reupload images and text".

notconfusing commented 9 years ago

@Daniel-Mietchen @wrought i am in the middle of this, and i was thinking about an issue. whenever we process an article we keep all our metadata on it at a time t. if we reupload at time t+1, do we need to store the old metadata at t for any reason?

it just changes some assumptions about how the database is being stored, and would require some re-architecting. because right now a doi is associated with journal article metadata, and if we keep old versions then a doi has to be associated with a list of time-stamped article metadata.

Daniel-Mietchen commented 9 years ago

I think we can overwrite the metadata by default, though in the long run - given the scale that we envisage this to operate on - there will be cases when the metadata we can pull from PMC or CrossRef or elsewhere is wrong, so we will have to make provisions to fix it in a way that does not require to go out to all copies of the metadata by hand (i.e. on Wikisource, Commons, Wikipedia etc.). Hopefully, Wikidata will solve this.

Daniel-Mietchen commented 9 years ago

Re "reupload text only" and "reupload images and text":

I do not see a reason why the figure file names should change between re-uploads, though we should watch out for cases when our uploads have been renamed by someone else.

notconfusing commented 9 years ago

How often is it that a correction released to an article will include the addition or removal of an image?

Daniel-Mietchen commented 9 years ago

Not sure I understand what you're after here (correction by whom? article on Wikisource?), but: Amongst the articles we've imported so far, there are about two where some formatting (usually tables) broke the article, resulting in part of it not being uploaded and images not linked.

Otherwise, I do not see how a re-upload of the text to Wikisource would change any images on the page.

notconfusing commented 9 years ago

I meant if a correction article by the publisher and authors of the paper are released - but its a very small edge case that is not worth worrying about right now.

fixed in https://github.com/wpoa/recitation-bot-web-interface/commit/ad150e5f2056f34b93ce7001b8db7c88c4489650