Open benoit74 opened 1 week ago
Nota: it looks like it might also be broken to properly create a ZIM of this website, even Zimit1 is failing to retrieve Youtube videos in my last tests ...
UP? Do I delete production files which are significantly broken?
I'd say nobody complains because they're offline. But offering significantly broken zim files when their download is a cost is not appropriate.
You're right, so I've deleted the ZIMs from the library, let's start over with these ZIMs creation and publish them once they are really ready:
@benoit74 @rgaudin Two hours have passed and the files are still in the library! Why?
library-refresh is broken. pokemonwiki_en_all_maxi
was moved from other
to zimit
without opening a ticket, as is the procedure (and a known limitation of the tool).
@benoit74 is probably better informed so he'll follow-up
It's the opposite, a ticket was done where we decided to move the file from zimit
to other
, but I forgot to update the recipe so next ZIM created 3 days ago was pushed to ... zimit
again 😫
Situation is fixed, library will refresh soon
ZIM(s) location
https://library.kiwix.org/#lang=&q=gcf
Recipe(s) URL
https://farm.openzim.org/recipes?name=edu.gcfglobal.org
Readers tested
Which ZIM versions are impacted?
All PROD versions are impacted
Details
Content inside courses are loaded lesson by lesson.
For instance, on https://edu.gcfglobal.org/en/beginning-a-new-career/transferring-your-skills-to-a-new-career/1/, you start with a "Continue" button at the bottom of the page. When you click on it, Lesson 2 is loaded and appears, including a new "Continue" button at the bottom of the page. And so on.
Currently, all content behind "Continue" buttons is not available in the ZIM (see https://library.kiwix.org/viewer#edu.gcfglobal.org_en_all_2024-06/edu.gcfglobal.org/en/beginning-a-new-career/how-to-decide-on-a-career-field/1/). Problem is that crawler had no idea that more content was hiding behind this "Continue" buttons.
Typical solution to solve this problem is to develop a custom behavior for Browsertrix Crawler which would fake clicks on these buttons so that the crawler fetches corresponding content.
Note for self: URL loaded by "Continue" button seems protected by a timestamp, e.g. https://edu.gcfglobal.org/en/beginning-a-new-career/transferring-your-skills-to-a-new-career/content/?_=1718454918606 ; fuzzy rule to remove this is most probably required.
@Popolechien @RavanJAltaie shall we keep the ZIM in production even if courses are incomplete? No-one complains, and Youtube video are present so it is not like we have nothing, but clearly it is incomplete. I wouldn't recommend to delete them since their limited content already provides some value + it might take some time until I develop the custom behavior.