openzim / zim-requests

Want a new ZIM file? Propose ZIM content improvements or fixes? Here you are!
https://farm.openzim.org
35 stars 2 forks source link

New request: create one "all TED videos" ZIM #969

Closed benoit74 closed 1 week ago

benoit74 commented 3 months ago

We are close to publish many TED ZIMs with videos filtered per topic (i.e. we will have one ZIM per TED topic).

What about creating one "all TED videos" ZIM, where we would fetch TED videos of all topics, no matter the associated topic?

I would personally prefer to use such a ZIM (even if quite big, probably in the order of 100GB) rather than choosing which topic I prefer and having videos duplicated between the ZIMs (because the topics I like are in fact close to one another and multiple videos are hence present in multiple topics / ZIMs I've chosen).

Popolechien commented 3 months ago

LGTM but it needs to be clearly identified as The TED Zim To Bind Them All or somesuch so that people don't download that along with the rest.

Popolechien commented 3 months ago

@RavanJAltaie thoughts?

benoit74 commented 3 months ago

I propose:

RavanJAltaie commented 3 months ago

I agree with @benoit74 suggestion but truly I don't know why shall we do this? This will be a very big file with ultimately few people who will want/be able to download it. So what's the point?

Popolechien commented 3 months ago

I would personally prefer to use such a ZIM (even if quite big, probably in the order of 100GB) rather than choosing which topic I prefer and having videos duplicated between the ZIMs

I am not convinced either but then the cost is minimal and in a long-tail scenario I guess we'll always have users going for one or the other.

benoit74 commented 2 months ago

This will be a very big file with ultimately few people who will want/be able to download it. So what's the point?

I don't think it will be barely used. Let's say that I'm a target user. I'm interested in both science, technology, society and global issues. I'm mostly sure (mostly ready to bet 10 bucks 😅) that the sum of science, technology, society and global issues ZIMs sizes is going to be larger than the size of this "all" ZIM because lots of videos are redundant. This is going to be even worse if I want few more topics because there is some niche topics which are not totally covered by the topic already mentioned but still interest me like let's say innovation and climate change. I'm not even speaking about the fact that having all these "small" ZIMs is a pain in term of search / video navigation compared to one single big ZIM

benoit74 commented 3 weeks ago

Recipe created and requested: https://farm.openzim.org/recipes/ted_topic_all

benoit74 commented 3 weeks ago

Task failed due to https://github.com/openzim/ted/issues/213

I passed the list of all topics "manually".

It made me realize there is 5 new topics (generosity, wildlife, reproductive health, artificial intelligence, tech). I've created the corresponding recipes and they've started automatically.

Popolechien commented 3 weeks ago

It made me realize

I take it you realized it because these names struck you as new, but there is no automatized way to know when a new topic appears?

benoit74 commented 3 weeks ago

I take it you realized it because these names struck you as new, but there is no automatized way to know when a new topic appears?

Just the total number of recipes was displayed and it said "360" where I knew it was supposed to be "355".

Creating again the missing recipes is mostly automated, I "just" need to run a script on my machine. I still prefer to not fully automate this since it is quite important to check before really requesting the new topics, should the tool go wild for instance.

I propose to setup a workflow to create a quarterly issue to update the list of recipes (it takes me between 5 minutes - if nothing needs to be done - to 15 minutes - if new recipes needs to be created and I need to check them - to run this). Would that be ok?

Popolechien commented 3 weeks ago

LGTM

RavanJAltaie commented 3 weeks ago

LGTM as well.

benoit74 commented 1 week ago

ZIM ready at https://library.kiwix.org/#lang=&q=all+ted

File size is is little bit less than 79GB. Interesting since total size of all other TED ZIMs is 419GiB.

Popolechien commented 1 week ago

File size is is little bit less than 79GB

Do we know how many videos are in there? I suspect the reduced size tells us a lot about how many duplicates are used over several topical lists.

Popolechien commented 1 week ago

Ah, I see 165 pages with 40 videos each = 6600 TEDs

kelson42 commented 1 week ago

This can also be checked with the "Counter" ZIM metadata

benoit74 commented 1 week ago

Yep, see https://library.kiwix.org/raw/ted_mul_all_2024-07/meta/Counter