openzim / zim-requests

Want a new ZIM file? Propose ZIM content improvements or fixes? Here you are!
https://farm.openzim.org
37 stars 2 forks source link

New request: create ZIMs of all TEDx talks #1152

Open benoit74 opened 2 weeks ago

benoit74 commented 2 weeks ago

Currently, TED scraper / Zimfarm configurations are only scraping the official TED talks, published on TED website. This means about 6.6K individual videos.

Only few TEDx talks are included (e.g. 567 videos in https://download.kiwix.org/zim/ted/ted_mul_tedx_2024-08.zim) but this is only a very small fractions of the 221K TEDx videos hosted on Youtube official channel: https://www.youtube.com/Tedxtalks

AFAIK, these TEDx talks hosted on ted.com are the official ones endorsed by TED organization, where the TEDx talks from youtube channel are talks from conferences organized by independent organizations only reusing the brand (with permission).

I would like that we provide all these TEDx talks as ZIMs as well. We obviously need to discuss a strategy to create ZIMs which are practical to handle (in term of size) and search for content.

RavanJAltaie commented 22 hours ago

@benoit74 the only problem with this is that we have nearly hundred thousands of TEDx talks in English, if we count other languages we will reach millions. @Popolechien do you think this is feasible?

Popolechien commented 17 hours ago

Well I see that TEDx in Portuguese already near 5,000 videos. Is there a way to find out if there is some sort of curation by topic that we can also scrape?

benoit74 commented 14 hours ago

I do not have more ways than you do have.

benoit74 commented 14 hours ago

We already agreed in the past that we should avoid to have multiple language per ZIM, so I still agree that focusing only on playlist per language is already interesting.

Btw, where do you see 5000 videos in Portugues?

I see only 364: image

But anyway, I don't think that 5000 videos is impossible to scrape, even if sure it has an associated cost.

We can maybe focus on languages which are badly covered by TED and are common in our known userbase and have a modest number of videos.

Popolechien commented 10 hours ago

Yours says more! videos in portuguese. But on the front page they have the Portuguese one (and Spanish (4,947 videos) and Hindi (1,987)) Screenshot 2024-09-19 at 14 53 03

benoit74 commented 9 hours ago

They have two playlists for portuguese ...

https://youtube.com/playlist?list=PLsRNoUx8w3rMzRnIIYOsv-oYXbIHqDNk_ https://youtube.com/playlist?list=PLsRNoUx8w3rOwHx4kVJL5ksS9vTxj5hXn

Aside the number of videos, don't know what the difference is ...