Root cause of the MP3 not being crawled seems to be a bit weird, probably linked to website behavior. When I load the page in my browser, sometimes the MP3 are eagerly loaded when the page loads, sometimes they are lazy-loaded when I click on an image. For instance every time I refresh the page with Cmd+Shift+R, the MP3 are lazy loaded. But when I navigate to the page or enter the page URL in browser URL bar, then the MP3 are eagerly loaded. Kind of strange ...
Recipe is for now disabled. When need to discuss what is the priority of fixing this, I'm not sure the website sees many update. Only concern is that for now we are blocked in Zimit1.
Maybe we should try to crawl the website again with an older crawler since production ZIMs are OK ... or is it linked to a change in website behavior?
ZIM(s) location
https://dev.library.kiwix.org/viewer#teoria.com_en_2024-06
Recipe(s) URL
https://farm.openzim.org/recipes/teoria.com_en
Readers tested
Both ZIM versions impacted?
No, only last version is impacted
Details
Problem has been discovered in dev library, during Zimit 2 rollout. Prod ZIM is OK for some unknown reason.
Some MP3 files are not present in the ZIM, because they are not present in the WARC, because they are not retrieved by the crawler.
Sample page where these MP3 are present is https://www.teoria.com/en/tutorials/functions/nonharmonic-tones/02-passing.php
This page is working in prod ZIM (at https://library.kiwix.org/viewer#teoria.com_en_2024-02/A/www.teoria.com/en/tutorials/functions/nonharmonic-tones/02-passing.php) but not in dev (at https://dev.library.kiwix.org/viewer#teoria.com_en_2024-06/www.teoria.com/en/tutorials/functions/nonharmonic-tones/02-passing.php)
The MP3 plays when we click the images.
Root cause of the MP3 not being crawled seems to be a bit weird, probably linked to website behavior. When I load the page in my browser, sometimes the MP3 are eagerly loaded when the page loads, sometimes they are lazy-loaded when I click on an image. For instance every time I refresh the page with Cmd+Shift+R, the MP3 are lazy loaded. But when I navigate to the page or enter the page URL in browser URL bar, then the MP3 are eagerly loaded. Kind of strange ...
Recipe is for now disabled. When need to discuss what is the priority of fixing this, I'm not sure the website sees many update. Only concern is that for now we are blocked in Zimit1.
Maybe we should try to crawl the website again with an older crawler since production ZIMs are OK ... or is it linked to a change in website behavior?