Troublshooting All the troops recipe

openzim / mwoffliner

Mediawiki scraper: all your wiki articles in one highly compressed ZIM file

https://www.npmjs.com/package/mwoffliner

GNU General Public License v3.0

275 stars 72 forks source link

Troublshooting All the troops recipe #1956

Open RavanJAltaie opened 9 months ago

RavanJAltaie commented 9 months ago

As per this issue, please check why this recipe is failing while we have already a successful resulted file in the past.

The link to the recipe on the Zimfarm: https://farm.openzim.org/recipes/all_the_tropes

The link to last working ZIM (if any): https://download.kiwix.org/zim/other/allthetropes_en_all_maxi_2020-10.zim

The error I get: Unable to connect to S3, either S3 login credentials are wrong or bucket cannot be found

benoit74 commented 9 months ago

The error you mention seems to be a transient issue with our S3 storage (used to cache some files). If you look at previous tasks, it mention various articles which fails to download:

https://farm.openzim.org/pipeline/aeffb4b0-9794-44fe-80e6-1c04383a171d/debug failed with "Adjacent_to_This_Complete_Breakfast/Quotes" article
https://farm.openzim.org/pipeline/df638eb9-7039-42ed-89e1-55d093598d88/debug failed with "Sharpe/Tear_Jerker"
https://farm.openzim.org/pipeline/35c2c396-b5dc-4aab-b2e4-6a6161fa2b6a/debug failed with "The_Hebrew_Hammer/YMMV"

I did not add more courage to go more into the history.

Looks like many articles are not returning properly from the API.

@Popolechien @kelson42 : should we spend time to identify a list of failing articles to exclude (like I did for pokemon_fandom_en_all), hope some change in the scraper/server will change things or just say it will never work for this website?

Popolechien commented 9 months ago

it mention various articles which fails to download

Question: you only mention one article per task - does this mean that each task failed because of a single entry and that this entry was seemingly random? If yes, then how would building a list from past errors guarantee that future tasks based on this yet-not-understood behaviour will not fail as well?

benoit74 commented 9 months ago

Scraper stops on first failing article. And ordering of articles fetch is random (two consecutive try do not give same order). Try and retry until it works would provide the full list of failing articles.

benoit74 commented 9 months ago

(Try, add failing title to ignore list, retry to be exact)

Popolechien commented 9 months ago

Ok so we know that no matter the order, a given article would always fail. Then yeah, worth a try, but if this is entirely manual work won't this be extremely tedious? I'm tempted to suggest we discuss this at the next team meeting, just in case I'm missing on some info/context/background.