Closed benoit74 closed 2 months ago
This is in fact a Zimit issue, and most probably has nothing to do with Zimit2. I'm transferring it to zimit repo and will give more explanations once transferred.
I've done some tests with zimit2 and warc2zim2 (url_handling
branch from PR https://github.com/openzim/warc2zim/pull/218 but we will see it does not matter).
Browsertrix crawler is hence 1.0.0 beta-6
I ran 4 different tests:
--mobileDevice
and zimit custom user agent
crawl --failOnFailedSeed --waitUntil load --behaviors "autoplay,autofetch,siteSpecific" --url "https://tmp.kiwix.org/ci/test-website/youtube.html" --userAgent "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Safari/605.1.15 +Zimit contact+zimfarm@kiwix.org" --cwd /output/.tmppqvsfui5 --combineWARC
--mobileDevice
and no user agent customization
crawl --failOnFailedSeed --waitUntil load --behaviors "autoplay,autofetch,siteSpecific" --url "https://tmp.kiwix.org/ci/test-website/youtube.html" --mobileDevice "Pixel 2" --cwd /output/.tmppqvsfui5 --combineWARC
--mobileDevice
and zimit user agent customization
crawl --failOnFailedSeed --waitUntil load --behaviors "autoplay,autofetch,siteSpecific" --url "https://tmp.kiwix.org/ci/test-website/youtube.html" --mobileDevice "Pixel 2" --userAgent "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Safari/605.1.15 +Zimit contact+zimfarm@kiwix.org" --cwd /output/.tmppqvsfui5 --combineWARC
--mobileDevice
but with a user-agent looking like a Pixel 2:
crawl --failOnFailedSeed --waitUntil load --behaviors "autoplay,autofetch,siteSpecific" --url "https://tmp.kiwix.org/ci/test-website/youtube.html" --userAgent "Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3765.0 Mobile Safari/537.36 +Zimit contact+zimfarm@kiwix.org" --cwd /output/.tmppqvsfui5 --combineWARC
Device / Reader | A | B | C | D |
---|---|---|---|---|
MacOS 12.7.4 - Kiwix reader opened in Firefox | ✅ | ✅ | ✅ | ✅ |
MacOS 12.7.4 - Kiwix native app (3.3.0 build 145) | ❌ | ✅ (very slow to load) | ❌ | ✅ (very slow to load) |
iPhone 13 (iOS 15) - Kiwix reader opened in Safari | ❌ | ✅ | ❌ | ✅ |
Fairphone 4 5G (Android 13) - Kiwix reader opened in Firefox | ❌ | ✅ | ❌ | ✅ |
Fairphone 4 5G (Android 13) - Kiwix reader opened in Firefox | ❌ | ✅ | ❌ | ✅ |
Even if testing more readers will be important, conclusion seems pretty clear.
For Youtube videos (at least), we must use another userAgent than the current one.
Previous work on https://github.com/openzim/zimit/pull/229 (where we switched by default to a mandatory UA and choose to use a "desktop-like" UA) was not totally a good idea. It helped solve some problems with Python check of the URL ... but caused other issues like this one.
Now that Python check of the URL is gone, we should probably rollback most of PR 229 changes:
--userAgent
and --userAgentSuffix
in zimit codeI also recommend to set a default --mobileDevice
, so that a proper userAgent is passed (concatenated with our default userAgentSuffix) since it seems mostly mandatory for proper zimit operation, and add support for a new --noMobileDevice
, which would not set the argument --mobileDevice
argument in browsertrix crawler CLI call (should someone want to not set use mobileDevice ... probably rare, but priceless to implement ... probably not needed to be exposed on Zimfarm)
Then comes the question of which default mobileDevice to choose. For tests I chose Pixel 2, full list is here: https://github.com/puppeteer/puppeteer/blob/b144935789315697254880015847b2b4d151d52b/packages/puppeteer-core/src/common/Device.ts ; smaller screen might lead to situations where we are served a small asset, which is more or less what we prefer to keep ZIM size small and work on all screen size. This was my logic when I chose Pixel 2 for tests.
Edit: fix the test table, second device was wrong
Nota: I've also checked, in all cases the video which is retrieved is identical (same size, same codecs, ...) ... so the "fix" induced by using a more appropriate user-agent is only linked to "other" contents, not to the video codec or stuff like that.
Just to confirm that the solutions B and D both work in the PWA and the Browser Extension. Was version B the adopted solution?
Yes, solution B is currently in place in zimit2
branch
Yes, solution B is currently in place in zimit2 branch
To be more precise, by default, "Pixel 2" is used as mobile device. Zimit user is free to override this setting with --mobileDevice
(as before) or use --noMobileDevice
to remove the default and use no mobile device.
We have to fix the situation where Youtube videos are not working everywhere.
We typically now that they do not play in kiwix-serve on Android Firefox / Chrome (while they should) and it looks like they do not play on kiwix-serve on Windows as well: https://github.com/openzim/warc2zim/issues/206#issuecomment-2022247860