Closed benoit74 closed 3 months ago
https://www.ted.com/talks/william_sieghart_the_connective_potential_of_poetry has a null h264
property in playerData.resources
.
The fact is that in such a case, the scraper does not work. While the player on the web page fallback to a Youtube player as can be seen in screenshot below:
Other videos hosted on TED CDN have a different player:
This issue is then more about the support for Youtube-hosted videos when the TED one is not available. This impact many topics where some videos are missing because hosted only on Youtube. I'm currently running an evaluation of the big science
topic.
Btw, the Firefox vs Chrome/Brave situation is just a side-effect we should not care about, it is mainly a TED problem in fact ^^
In the science
topic, 55 videos have been ignored. It looks like all could have been downloaded. Total amount of videos in this topic is 1489, so this represent about 3.6%.
Good to know !
I should have written "restore support for ...". It was working in the past, and most of the code is still there we just do not parse properly the video JSON data anymore.
Good news then (I think)
Very good yes, I just had to add a bit more stuff to align everything a bit better, you will see in the PR, but at least if was fast and easy.
Thank you for your good remembering of how the scraper worked in the past, it definitely help (together with the usual git bisect
, I love this tool ^^)
See https://github.com/openzim/zim-requests/issues/849 for details