openzim / ted

Provide the best of TED.com for offline usage!
https://download.kiwix.org/zim/ted/
GNU General Public License v3.0
13 stars 8 forks source link

Restore support for Youtube-hosted videos (videos not hosted on TED CDN) #164

Closed benoit74 closed 3 months ago

benoit74 commented 4 months ago

See https://github.com/openzim/zim-requests/issues/849 for details

benoit74 commented 4 months ago

https://www.ted.com/talks/william_sieghart_the_connective_potential_of_poetry has a null h264 property in playerData.resources.

The fact is that in such a case, the scraper does not work. While the player on the web page fallback to a Youtube player as can be seen in screenshot below:

image

Other videos hosted on TED CDN have a different player:

image

This issue is then more about the support for Youtube-hosted videos when the TED one is not available. This impact many topics where some videos are missing because hosted only on Youtube. I'm currently running an evaluation of the big science topic.

benoit74 commented 4 months ago

Btw, the Firefox vs Chrome/Brave situation is just a side-effect we should not care about, it is mainly a TED problem in fact ^^

benoit74 commented 4 months ago

In the science topic, 55 videos have been ignored. It looks like all could have been downloaded. Total amount of videos in this topic is 1489, so this represent about 3.6%.

rgaudin commented 4 months ago

Good to know !

benoit74 commented 3 months ago

I should have written "restore support for ...". It was working in the past, and most of the code is still there we just do not parse properly the video JSON data anymore.

rgaudin commented 3 months ago

Good news then (I think)

benoit74 commented 3 months ago

Very good yes, I just had to add a bit more stuff to align everything a bit better, you will see in the PR, but at least if was fast and easy.

benoit74 commented 3 months ago

Thank you for your good remembering of how the scraper worked in the past, it definitely help (together with the usual git bisect, I love this tool ^^)