openzim / zim-requests

Want a new ZIM file? Propose ZIM content improvements or fixes? Here you are!
https://farm.openzim.org
35 stars 2 forks source link

TED: update all ZIMs by topic #789

Open benoit74 opened 6 months ago

benoit74 commented 6 months ago

As discussed in many meetings / issues, we will:

I will assist on this by automating the creation of the recipe.

What we have agreed:

In order to grab videos in all languages, since the scrapper does not support it we will use a trick:

We will use the --low-quality option for now since the scraper does not allow to really create high quality videos anyway (see https://github.com/openzim/ted/issues/160)

benoit74 commented 6 months ago

I've created the first recipe with an automated tool: https://farm.openzim.org/recipes/ted_topic_street-art

Note that I have created a new Zimfarm tag so that it is now easier to find all Ted recipes which grab a topic: https://farm.openzim.org/recipes?tag=ted-by-topic

All next recipes will be created with the same settings (except topic which will change of course).

@RavanJAltaie could you please:

Once I have your confirmation, I will create the 354 remaining recipes.

RavanJAltaie commented 5 months ago

@benoit74 I've checked the recipe parameters, they all look fine (AFAIK). About the zim file, it's working fine, 4 videos are there in English, but when I filter the languages to French, chinese, spanish, nothing happens. Same videos in English are there. When I filter to German, 1 video disappears and there are 3 videos left.

benoit74 commented 5 months ago

Thank you !

About the filter by language, AFAIK this is the "normal" behavior, this is linked to the fact that all videos have voice in English and subtitles in French, Chinese and Spanish, while only 3 videos out of 4 have German subtitles.

I agree this is far from intuitive, but I don't think this is something new.

Please check another TED ZIM created before.

If everything is ok (i.e. there is nothing new here, we know that the UI for TED and youtube videos must be enhanced), please move the recipe to PROD and confirm I can create all remaining recipes.

benoit74 commented 5 months ago

We've agreed last thursday that I might proceed.

However I now have ~two~ three doubts:

RavanJAltaie commented 5 months ago

Hello @benoit74

benoit74 commented 5 months ago

All 354 new recipes (out of 355) have been created, they should start soon.

benoit74 commented 4 months ago

I also propose that we create a ted_topic_all recipe which will create one single ZIM with all TED videos (easily doable by setting the topic to "all"). Some people (at least me) might be interested to fetch this big ZIM rather than many smaller ones with overlapping videos across them. WDYT?

benoit74 commented 1 month ago

TED scraper 3.0.2 is now ready with new encoder and hopefully mostly everything fixed for proper release of these ZIMs. I built and tested few ZIMs over the weekend and they looked perfect.

I hence reenabled, reconfigured and enabled all ted recipes by topic with a letter + ted_topic_3d-printing to publish new ZIMs to production.

@RavanJAltaie @Popolechien could you please have a look at these ZIMs as well once they hit production?

If quality is confirmed to be OK, I propose that we go as planned and enable and request all recipes of TED by topic (I can do it programmatically quite easily, no need to click 355 times on the "request" button).

kelson42 commented 1 month ago

@benoit74 Please share on or two URL of ZIM with VP9 when the first are online.

Popolechien commented 1 month ago

Yes I was going to ask what it is we should specifically test for (sound on macOs, isn'it?)

benoit74 commented 1 month ago

Please share on or two URL of ZIM with VP9 when the first are online.

You can already try on dev for instance:

They will not look much different in prod

Yes I was going to ask what it is we should specifically test for (sound on macOs, isn'it?)

Everything on every platform ^^ (but yes, video and sound on iOS would be good to confirm again; I already tested on macOS latest Testflight)

benoit74 commented 1 month ago

New ZIMs are ready in production, for instance:

Popolechien commented 1 month ago

Tested on macOS, Kiwix-JS, Kiwix-JS PWA : no problem

TED anthropocene mul renders weirdly in the library (though it's probably an android / mul issue):

Screenshot_20240624-213904

Tbh if the reason it is considered multilingual is the presence of subtitles then I would consider it wrong.

Jaifroid commented 1 month ago

I tested the Addiction ZIM on Kiwix Desktop and KJS. All functional. The pointer arrow disappeared for me with Kiwix Desktop in fullscreen video mode, but it's minor and may be OS-dependent anyway. I doubt it is ZIM-specific.

All works well on Kiwix JS Browser Extension. On the PWA, there is a small bug when using the dropdown language selector on the home screen: it thinks the user is clicking an external link. Having set a language, the PWA also appears to forget the language selection when clicking through to a video. I'll open issues for these: they're small issues on my side.

benoit74 commented 1 month ago

Could you share the link where you saw this? Is it in the Android catalog? Then we should probably open an Android issue to discuss the topic. Having multiple languages in a ZIM is not supposed to become a problem.

Regarding your point about why we considered this multilingual, yes it is because of the presence of either audio or subtitles (and for most languages in the list, it is only for subtitles). I don't get why you consider it wrong. We should probably open a dedicated issue to discuss this in either openzim/ted or maybe even better in openzim/overview, since there is absolutely no guidelines on this matter AFAIK. And as a French person, when I select "Français" in the library, I would like to see all ZIMs which are usable by someone who only listen / read French, and these TED ZIMs comply with this expectation, so not something easy to rule out without more arguments.

benoit74 commented 1 month ago

On the PWA, there is a small bug when using the dropdown language selector on the home screen: it thinks the user is clicking an external link. Having set a language, the PWA also appears to forget the language selection when clicking through to a video. I'll open issues for these: they're small issues on my side.

Just be aware that this old UI might sooner or later be deprecated in favor in the new UI that is currently being developed for Youtube. So do not spend too much time on this.

Jaifroid commented 1 month ago

Just be aware that this old UI might sooner or later be deprecated in favor in the new UI that is currently being developed for Youtube. So do not spend too much time on this.

OK, thanks! These bugs nevertheless reveal flaws in the underlying code, so a generic fix will still be useful.

Jaifroid commented 1 month ago

Tbh if the reason it is considered multilingual is the presence of subtitles then I would consider it wrong.

Not just subtitles, but the language of video descriptions and the video information pages. Seems like a prima facie case of being a multilingual ZIM to me (but I know we have somewhat different perspectives on this 😉).

Popolechien commented 1 month ago

Ok I found at least one of them videos where the guy spoke French so I guess we're good-ish

benoit74 commented 1 month ago

Ok I found at least one of them videos where the guy spoke French so I guess we're good-ish

I don't get why only audio matters. When you have the title + the description + the subtitle in French, isn't this usable by any French person?

Anyway, it matches what has been decided to do so far (do not hesitate to open a dedicated issue if this really needs to be discussed), so I just requested all remaining ZIMs for TED by topics.

Jaifroid commented 1 month ago

isn't this usable by any French person?

Maybe even some Swiss people 🤣

Popolechien commented 1 month ago

Well my expectation for a video labeled as "French" (or German, Polish, etc.) is that the video itself is in that language. Using subtitles (or having the layout and description translated) does not deviate from the fact that the original / main content is not in the language advertised. Dubbing, on the other hand, would make me consider it as "in French".

Same as if I watch a Korean movie on Netflix: no matter my home settings and the use of subtitles I would still see it as a Korean-language movie if on-screen characters speak in that language.

Popolechien commented 1 month ago

Could you share the link where you saw this?

Not sure the question was for me but just in case I simply downloaded from dev and opened it via Kiwix android (ie not from the app, so no link)

benoit74 commented 1 month ago

Not sure the question was for me but just in case I simply downloaded from dev and opened it via Kiwix android (ie not from the app, so no link)

It was for you, yes, sorry, didn't expected someone else to react in the mean time ^^

OK then this is an Android issue, please open an issue in github.com/kiwix/kiwix-android

Popolechien commented 1 month ago

done at https://github.com/kiwix/kiwix-android/issues/3892

Jaifroid commented 1 month ago

Well my expectation for a video labelled as "French" (or German, Polish, etc.) is that the video itself is in that language. Using subtitles (or having the layout and description translated) does not deviate from the fact that the original / main content is not in the language advertised. Dubbing, on the other hand, would make me consider it as "in French".

I think we should focus on discovery: do we want French/Belgian/Francophone, and maybe even a few monolingual Swiss (not sure if such a creature exists 😉), to be able to discover the content if they have set their Library language to "Français"?

Popolechien commented 1 month ago

I'm all for encouraging people to be better persons, but I'm less sure about tricking them into it 😉