openzim / zim-requests

Want a new ZIM file? Propose ZIM content improvements or fixes? Here you are!
https://farm.openzim.org
35 stars 2 forks source link

teoria.com_en is failing #1040

Open benoit74 opened 2 weeks ago

benoit74 commented 2 weeks ago

ZIM(s) location

https://dev.library.kiwix.org/viewer#teoria.com_en_2024-06

Recipe(s) URL

https://farm.openzim.org/recipes/teoria.com_en

Readers tested

Both ZIM versions impacted?

No, only last version is impacted

Details

Problem has been discovered in dev library, during Zimit 2 rollout. Prod ZIM is OK for some unknown reason.

Some MP3 files are not present in the ZIM, because they are not present in the WARC, because they are not retrieved by the crawler.

Sample page where these MP3 are present is https://www.teoria.com/en/tutorials/functions/nonharmonic-tones/02-passing.php

This page is working in prod ZIM (at https://library.kiwix.org/viewer#teoria.com_en_2024-02/A/www.teoria.com/en/tutorials/functions/nonharmonic-tones/02-passing.php) but not in dev (at https://dev.library.kiwix.org/viewer#teoria.com_en_2024-06/www.teoria.com/en/tutorials/functions/nonharmonic-tones/02-passing.php)

The MP3 plays when we click the images.

Root cause of the MP3 not being crawled seems to be a bit weird, probably linked to website behavior. When I load the page in my browser, sometimes the MP3 are eagerly loaded when the page loads, sometimes they are lazy-loaded when I click on an image. For instance every time I refresh the page with Cmd+Shift+R, the MP3 are lazy loaded. But when I navigate to the page or enter the page URL in browser URL bar, then the MP3 are eagerly loaded. Kind of strange ...

Recipe is for now disabled. When need to discuss what is the priority of fixing this, I'm not sure the website sees many update. Only concern is that for now we are blocked in Zimit1.

Maybe we should try to crawl the website again with an older crawler since production ZIMs are OK ... or is it linked to a change in website behavior?