openzim / zim-requests

Want a new ZIM file? Propose ZIM content improvements or fixes? Here you are!
https://farm.openzim.org
37 stars 2 forks source link

Digital Public Library of America #643

Open RavanJAltaie opened 1 year ago

RavanJAltaie commented 1 year ago

As a part of scouting Grey Box content, we need to create zim file for the details below:

Website URL: https://dp.la/browse-by-topic License: [CC BY 3.0 License] Desired ZIM Title: Digital Public Library of America Desired ZIM Description: The Digital Public Library of America Desired ZIM Icon –png (URL or attach one): Language (ISO 639-3): eng Is this a MediaWiki?: no

RavanJAltaie commented 1 year ago

Recipe Created https://farm.openzim.org/recipes/dp.la_en_all

RavanJAltaie commented 1 year ago

Recipe succeeded but the scrapper didn't get all the pages, sent for troubleshooting.

benoit74 commented 7 months ago

Recipe is still not OK, it is failing most of the time: https://farm.openzim.org/recipes/dp.la_en_all ; I disabled the recipe in the Zimfarm for now.

It looks like it succeeded once 2 months ago: https://library.kiwix.org/viewer#dp.la_en_all_2023-12 but I'm not sure the content is appropriate (why is it already pushing ZIMs to production if recipe has not yet been validated?).

RavanJAltaie commented 6 months ago

I checked, the file in library looks like removed or deleted. I'll tag this issue as upstream

benoit74 commented 6 months ago

I checked, the file in library looks like removed or deleted.

I don't get it, the file is still here: https://library.kiwix.org/viewer#dp.la_en_all_2023-12 and here: https://download.kiwix.org/zim/zimit/dp.la_en_all_2023-08.zim and also here: https://download.kiwix.org/zim/zimit/dp.la_en_all_2023-12.zim

@RavanJAltaie please confirm what we should do with these ZIMs, are they correct or should we delete them?

I've removed the Upstream label for now, we will add it back once this most urgent topic is tackled (we must not publish ZIM is quality is not ok).

benoit74 commented 3 months ago

Since no-one seems to care now that no-one is assigned, I deleted the corrupted files from library.

ZIM of dp.la still probably doable with zimit if configured properly (I recommend to start with one single page e.g. https://dp.la/primary-source-sets/aviation/sources/1924 to confirm all images are properly retrieved, ...)