webrecorder / wabac.js

wabac.js - Web Archive Browsing Augmentation Client
https://replayweb.page
GNU Affero General Public License v3.0
96 stars 16 forks source link

Should Vimeo fuzzy rules be adapted? #169

Open benoit74 opened 3 months ago

benoit74 commented 3 months ago

FYI, in warc2zim2 we had to slightly adapt Vimeo fuzzy rules to have them support more scenarii. I'm not sure this has to be reflected in wabac, but I prefer to share the findings ^^

I did not took the time to test a WARC with your replay solution.

Change the video rewritting

See https://github.com/openzim/warc2zim/pull/228/commits/47e104c0f71c54a7060b3c8fca7422c7fe54bcb2

What I've observed is that in our test on https://website.test.openzim.org/vimeo.html, our adaptation of the fuzzy rule at https://github.com/webrecorder/wabac.js/blob/18b1286816779633491cbfb45c1b6c9524197633/src/fuzzymatcher.js#L15 wasn't matching at all because the domain was not matching (134vod-adaptive.akamaized.net) and because there was query parameters (not sure this is not a bug on our adaptation of the fuzzy rule).

I've decided for now to add support for the new domain and keep the range parameter (which seems to be the only important one from replay perspective).

Rewrite preview image from the CDN

The preview image (the one displayed before the user starts the video) comes from i.vimeocdn.com domain. Query parameters are added to request a size / quality matching the player need. From our experience, these query parameters are dynamically adapted, most probably based on viewport size or maybe other factors.

For instance, on my laptop there is two queries issued for the test video on https://website.test.openzim.org/vimeo.html:

But this is not what Browsertrix crawler got with --mobileDevice "Pixel 2":

We hence had to rewrite these URLs as well. For now, we decided to simply drop the query parameters. It is far from perfect, but from our experience there is just too many conditions to know which query parameters values would be present in the WARC and which will be requested at replay time.

Ideally we would benefit from using the "greater resolution available" ... but I failed to find how to do it easily. I hesitated to rewrite only when mh parameter is present, but it seems pretty fragile.