openzim / zim-requests

Want a new ZIM file? Propose ZIM content improvements or fixes? Here you are!
https://farm.openzim.org
35 stars 2 forks source link

edu.gcfglobal.org ZIMs are all missing some content #1052

Open benoit74 opened 1 week ago

benoit74 commented 1 week ago

ZIM(s) location

https://library.kiwix.org/#lang=&q=gcf

Recipe(s) URL

https://farm.openzim.org/recipes?name=edu.gcfglobal.org

Readers tested

Which ZIM versions are impacted?

All PROD versions are impacted

Details

Content inside courses are loaded lesson by lesson.

For instance, on https://edu.gcfglobal.org/en/beginning-a-new-career/transferring-your-skills-to-a-new-career/1/, you start with a "Continue" button at the bottom of the page. When you click on it, Lesson 2 is loaded and appears, including a new "Continue" button at the bottom of the page. And so on.

Currently, all content behind "Continue" buttons is not available in the ZIM (see https://library.kiwix.org/viewer#edu.gcfglobal.org_en_all_2024-06/edu.gcfglobal.org/en/beginning-a-new-career/how-to-decide-on-a-career-field/1/). Problem is that crawler had no idea that more content was hiding behind this "Continue" buttons.

Typical solution to solve this problem is to develop a custom behavior for Browsertrix Crawler which would fake clicks on these buttons so that the crawler fetches corresponding content.

Note for self: URL loaded by "Continue" button seems protected by a timestamp, e.g. https://edu.gcfglobal.org/en/beginning-a-new-career/transferring-your-skills-to-a-new-career/content/?_=1718454918606 ; fuzzy rule to remove this is most probably required.

@Popolechien @RavanJAltaie shall we keep the ZIM in production even if courses are incomplete? No-one complains, and Youtube video are present so it is not like we have nothing, but clearly it is incomplete. I wouldn't recommend to delete them since their limited content already provides some value + it might take some time until I develop the custom behavior.

benoit74 commented 1 week ago

Nota: it looks like it might also be broken to properly create a ZIM of this website, even Zimit1 is failing to retrieve Youtube videos in my last tests ...

benoit74 commented 1 week ago

UP? Do I delete production files which are significantly broken?

Popolechien commented 1 week ago

I'd say nobody complains because they're offline. But offering significantly broken zim files when their download is a cost is not appropriate.

benoit74 commented 1 week ago

You're right, so I've deleted the ZIMs from the library, let's start over with these ZIMs creation and publish them once they are really ready:

kelson42 commented 1 week ago

@benoit74 @rgaudin Two hours have passed and the files are still in the library! Why?

rgaudin commented 1 week ago

library-refresh is broken. pokemonwiki_en_all_maxi was moved from other to zimit without opening a ticket, as is the procedure (and a known limitation of the tool). @benoit74 is probably better informed so he'll follow-up

benoit74 commented 1 week ago

It's the opposite, a ticket was done where we decided to move the file from zimit to other, but I forgot to update the recipe so next ZIM created 3 days ago was pushed to ... zimit again 😫

Situation is fixed, library will refresh soon