hatnote / montage

📷 Photo evaluation tool for and by Wiki Loves competitions
https://commons.wikimedia.org/wiki/Commons:Montage
BSD 3-Clause "New" or "Revised" License
37 stars 11 forks source link

Montage should allow to upload multiple categories at once #236

Open geertivp opened 1 month ago

geertivp commented 1 month ago

The ISA Tool allows to load multiple categories at once. This is extremely convenient for related campaigns to be processed as one unit. The same way Montage should allow for multiple categories to load into one Montage campaign. I have created a T299167 in January 2022, but still no solution. Please verify/prioritise... In addition to that the file URL upload, and the File list upload is broken (problems with UTF-8 characters and " and ' accents).

mahmoud commented 1 month ago

Hey Geert, I believe the unicode issues have mostly been fixed with the Python 3 upgrade earlier this year. I've been pushing changes for any stragglers as they come up, so please try again.

The idea for multiple categories is an interesting one. Are the categories all part of one parent category? Or are they really disjoint? Examples appreciated.

geertivp commented 1 month ago

Thanks, Mahmoud.

I had tried to use the CSV File URL upload, but it failed with the following error: Internal server error: <ExceptionInfo [TypeError: startswith first arg must be bytes or a tuple of bytes, not str] (41 frames, last=Callpoint('load_name_list', 129, 'montage.loaders', './montage/loaders.py', 46, " if filename.startswith('File:'):"))> I believe that one problem might be related by ' and " in the filenames. Could a tabbed CSV file be a solution (containing HT separator, instead of quoted comma separator) ? Also the File list solution returned a similar error...

Please find the list of related but disjoint campaigns that we wanted to juror as one unit:

Please note, as a workaround, I have added all the images into the category Images from Wiki Loves Heritage Belgium in 2024: in total 2382 images.

This is not the ideal solution, but the only one I have available, just the way I have done since 2022. I would really like that the multiple categories uploud could be (easily) implemented. This would releaf coordinators to take the complex CSV URL method.

geertivp commented 1 month ago

I have written a Python script to easily “merge” (=add) categories to a list of Wikimedia Commons Files: (based on the Category parameter). It is specially written for Wikimedia Commons, but it works for any MediaWiki project. pwb add_wikitext commons commons Wiki_Loves_Denderland_2024 stdin contains the wiki text to append to each page. The script checks for duplicate content. This way Montage can load one single (merged) category… Please find the script here. Example: File:Zandbergen brug over Dender.jpg

geertivp commented 1 month ago

To make the proposed functionality clear:

Please look at the ISA Tool as an example how to implement the GUI/backend part.

mahmoud commented 1 month ago

I had tried to use the CSV File URL upload, but it failed with the following error: Internal server error: <ExceptionInfo [TypeError: startswith first arg must be bytes or a tuple of bytes, not str] (41 frames, last=Callpoint('load_name_list', 129, 'montage.loaders', './montage/loaders.py', 46, " if filename.startswith('File:'):"))> I believe that one problem might be related by ' and " in the filenames. Could a tabbed CSV file be a solution (containing HT separator, instead of quoted comma separator) ? Also the File list solution returned a similar error...

Right, I figured this was due to the production Montage running a slightly outdated version of the code. I deployed the new code, so you can try the CSV method again. But actually now I see that the issue was that the File List URL provided was at a https://wikimedia.be/public/wlh/ URL, while the Montage code is currently expecting the URL to be a gist. The same file uploaded to https://gist.github.com actually worked as expected when I tried it just now.

This is technically in the field description of the File List URL field (see below), but I agree that reliance on gist alone is suboptimal. I think one reason for this was availability and versioning maybe? We can look at changing it in future, or at least adding validation to the frontend.

image

Thanks for sharing the details on the category list proposition. Just to be sure, you don't feel like having a unified category is useful for archival reasons? Like, if anyone ever wanted to browse the entries on Commons, seems like a single Category following a naming convention might be good vs having to try and find the list of the categories that were added. Might be worth discussing in a separate issue as well.

geertivp commented 1 month ago

I have one more question: How do you encode the filename when there is already a double quote in the filename? Sometimes it happens that both " and ' are within the original filename.

I agree that the file URL interface should not be restricted to https://gist.github.com/ but any webserver should be accepted...

Using one single category is of course the preferred solution, but we had 4 disjoint related campaigns. To load everything in one category, I had to perform a mass update the categories of almost 2000 images... which is not transparent to the owners of the images, nor to the general public that sees a "confusing" additional category...

Please take into account the possibility of loading multiple categories at once as described above.

Please note also that the File List (copy/paste) interface throw a similar error due to (double) quotes in some of the filenames.

Thank you very much for your answers and analysis!

CiellB commented 1 month ago

Wiki For Arabic Minorities would also like to have a range of subcategories included in their Montage campaign. I think what might be important when adding an option like this, is that the campaign coordinator can set how deep into the subcategories Montage will have to look. (taken from the glamorgan-tool)