openzim / warc2zim

Command line tool to convert a file in the WARC format to a file in the ZIM format
https://pypi.org/project/warc2zim/
GNU General Public License v3.0
44 stars 4 forks source link

Add support for MediaSource requests #329

Closed benoit74 closed 3 months ago

benoit74 commented 3 months ago

https://github.com/openzim/zimit/issues/323 has shown that youtube player has been updated to use new kind of requests for video content.

For now, a fix is implemented in wabac.js (and embedded in Browsertrix crawler in our case) to force Youtube player to fallback to previous requesting mode.

We however have no idea how long this workaround will be maintained by Youtube, and we might well need a way to properly handle the new requesting mode.

If I understood correctly, the component doing the requests is in fact not specific to Youtube but it is simply the MediaSource API, now supported by major browsers.

We hence have:

This is a medium term issue but a very concerning one.

ikreymer commented 3 months ago

The way this is handled more generally in wabac.js is with DASH and HLS manifest rewriting, those are existing standards for video streaming, and have been in use for many years. This is a good fallback, but some platforms YouTube and Vimeo, don't quite use the standard format, and specifying stream chunks in other ways. Capturing and replaying content with MediaSource is already possible/happening, its just less predictable and often requires additional work.

benoit74 commented 3 months ago

Thank you for these details! Closing this then since it looks we might have probably little to do in warc2zim for now