openzim / warc2zim

Command line tool to convert a file in the WARC format to a file in the ZIM format
https://pypi.org/project/warc2zim/
GNU General Public License v3.0
44 stars 4 forks source link

(Some?) Zimit-based videos on Android (official app and PWA) and iOS fail, whereas they work on desktop #144

Closed Jaifroid closed 6 months ago

Jaifroid commented 10 months ago

For example, in www.ready.gov_es_2023-12.zim, go to "Recursos" and then "Listo en su idioma", and scroll to bottom of page. The video there plays fine on desktop (Kiwix Serve / Browser Extension / PWA), but fails to play on the official Android app and on the Replay-enabled PWA on Android (see https://github.com/kiwix/kiwix-js-pwa/issues/512), with exactly the same behaviour on both of latter (ends up with the same failure message shown in, presumably, the local language of the scraper's IP address, screenshot below).

Now, I don't know if this is a video issue on Android referred to by @mgautierfr at some point (something to do with video needing to be stored in a contiguous block), or if it's simply that Android requires a different version/codec than the one scraped by the browser version in Zimit. I hope it's something that might be fixable in warc2zim (for Zimit v2.0).

All videos I tested in that ZIM seem to be affected, the one highlighted is just an example.

image

Jaifroid commented 9 months ago

I've now tested this issue on a Zimit2 archive mes-quartiers-chinois_fr_all_2024-01.zim (the one with fixed video), and can confirm that the issue also occurs in Android players for the YouTube videos in this ZIM. To be clear: these videos are playable on Kiwix Serve or the Desktop-based PWA, but do not work either on the Android PWA or on my copy of the Android app (3.9.1, Build 7230901). I continue to believe that this is a format issue (Android might be requesting a different video format, one that is not scraped, which doesn't necessarily mean that Android wouldn't be able to use the version that is scraped, but could just mean that the video player can't access it on Android).

image

Jaifroid commented 9 months ago

OK, I've now determined that the underlying YouTube mp4 files in the Zimit2 mes-quartiers ZIM can perfectly well be played on Android. For example, going directly to this ZIM URL: C/youtube.fuzzy.replayweb.page/videoplayback%3Fid%3Do-AAsbrEZ3PaNrpSlYax6n7VZ6Xnl80aUag3s8XzT-mkSO, allows the video to play in the PWA app.

I hate to say it, but I wonder if something like #172 or #173 might be necessary to play these videos on Android...

benoit74 commented 9 months ago

The issue also happens on iOS, see https://github.com/openzim/zim-requests/issues/809#issuecomment-1924416027 and https://github.com/openzim/zim-requests/issues/809#issuecomment-1924479842

Also see https://github.com/openzim/zim-requests/issues/809#issuecomment-1926385078, I assume that the report of Orca video working on "mes_quartiers_chinois" is wrong.

Jaifroid commented 9 months ago

The issue also happens on iOS, see openzim/zim-requests#809 (comment) and openzim/zim-requests#809 (comment)

Also see openzim/zim-requests#809 (comment), I assume that the report of Orca video working on "mes_quartiers_chinois" is wrong.

This issue was specifically about video failing on Android. The Browser Extension isn't available on mobile, but the PWA is available for both Android and iOS. I confirm that the "orca" video in 100rabbits isn't playing in the Android PWA, the official Android app, or on the iOS PWA (whereas it does work on desktop Chrome). I've found a specific blocker for iOS (regex incompatibility in the fuzzy substitutions) which I'll report in a separate issue -- which makes me think that the iOS problem is not the same as the Android problem, though this needs further investigation.

Jaifroid commented 9 months ago

Interesting data point for this issue:

This pretty much implies that the error is not in the Zimit v2 code -- given that it is an identical error for the same archive in its zimit1 and zimit2 versions -- and that the error was introduced some time between 15th September and 14th December last year.

It means one of these commits could well be the culprit:

image

@mgautierfr They are your commits, so you might be able to identify where an incompatibility might have been introduced for Android players.

benoit74 commented 9 months ago

These commits are simple reorganization of code, from one single file to a few files. I doubt they have any impact, probably harmless. I consider there is way more chance something changed at Youtube. And browsertrix crawler / puppeteer versions changed as well.

For those wanting to compare a recent scrape of zimit1 vs zimit2, we now have a fresh ZIM made with zimit1, same Browsertrix-Crawler 0.12.4 (with warcio.js 1.6.2 pywb 2.7.4) than zimit2, same recipe configuration avoiding to scrape useless pages: https://dev.library.kiwix.org/viewer#mes-quartiers-chinois_fr_zimit1_2024-02

benoit74 commented 9 months ago

One big change in Browsertrix crawler might be the move to Brave instead of Chrome, with browsertrix crawler 0.12 which was adopted in zimit 1.6 on Nov 2 (exactly during your period).

Jaifroid commented 9 months ago

One big change in Browsertrix crawler might be the move to Brave instead of Chrome, with browsertrix crawler 0.12 which was adopted in zimit 1.6 on Nov 2 (exactly during your period).

It's possible, though all the symptoms point to an incompatibility on the player side rather than on the scraper side. Browsertrix runs on desktop rather than Android. After all, the MP4 is correctly scraped and is in the ZIM, and can be played on Android by accessing it directly from its URL.

However, the issue with broken iOS video is clearly caused by an incompatible regex in the fuzzy transformations (zimit2), during playback. It seems more likely to me that it is something "simple" (but tricky) like that on the Android player side -- at least we should eliminate that possibility first.

benoit74 commented 9 months ago

OK, makes sense to keep focused on JS.

Let me know if you change your mind and if it could help if I create small WARC files of one single blog post of mes_quartiers_chinois with various browsertrix crawler versions, to confirm there is no problem on that side.

Jaifroid commented 9 months ago

OK, makes sense to keep focused on JS.

Let me know if you change your mind and if it could help if I create small WARC files of one single blog post of mes_quartiers_chinois with various browsertrix crawler versions, to confirm there is no problem on that side.

I'm just trying to dig a bit more into the JS logs during playback. If we hit a blank there, then what you say could certainly be useful.

What I am seeing is that, looking at the zimit1 ZIM of 100r from December as served by Kiwix Serve here: https://library.kiwix.org/content/100r-off-the-grid_en_2023-12/A/100r.co/site/off_the_grid.html, it's clear that the fuzzy transformation rules are being applied in Chrome Desktop (screenshot left), and are not being applied in Chrome on Android (screenshot right).. The screenshot shows what happens at the very moment I press play on the video. I tried the same experiment with the zimit2 version, but it's uninformative because the fuzzy transformations do not emit any console.log messages. It would need to be debugged step-by-step, which is hard to do on Android with my current setup.

Comparison_zimit1

Jaifroid commented 9 months ago

Right, I managed to debug https://dev.library.kiwix.org/content/100rabbits_en_2024-01/100r.co/site/orca.html on desktop and on Android, using remote debugging to my phone in Chrome.

I confirm that the fuzzy transformation rules DO run in this version, in wombat_setup.js. They are hit three times in Desktop when playing the video, but they are hit only twice on Android when attempting to play the video. I'm running out of time to do more testing, but it looks very much like we need to alter the regular expression, empirically, so that when the player asks for an Android version of the video (it may be as simple as having something like &player=android [fictional] in the query string), it gets hit by the transformation and returns the video in the ZIM.

@benoit74 @mgautierfr Whether this is caused by a new crawler or not, it does look as if the solution is to craft the regular expressions appropriately to catch what the Android player is looking for.

I have to stop tinkering now, but I hope this has given some leads.

benoit74 commented 9 months ago

This gave pretty good leads indeed, at least good directions to look into, thank you very much @Jaifroid

kelson42 commented 9 months ago

My remark is only slightly related to the issue, but I want to put in emphasis again that we should scrape Web sites with mobile in mind (so with a mobile-like spider configuration).

mgautierfr commented 8 months ago

It appears that the PR #172 is also the solution for youtube video on android. At least, I'm able to generate zim (https://tmp.kiwix.org/ci/test-warc/mes_quartiers_youtube_vimeo_fix.zim) with youtube video that play on android chrome (not tested in PWA).[*]

However, the video doesn't play on kiwix-android application. But custom zim replacing the player with html video tag (https://tmp.kiwix.org/ci/test-warc/mes_quartiers_youtube_video_tag.zim) (made with PR #173) doesn't play neither, so I suspect a codec issue or something else (but not a youtube js player problem as there is not youtube js player)

[*] It seems there is also a bug somewhere when it try to load a about:blank page and chrome put a div in front of almost all view (with the 404 kiwix-serve message) which catch input and prevent user to click on play. Workaround is to scroll youtube player on top and be able to play it (or connect with debug tools, and remove the div)

Jaifroid commented 8 months ago

Thanks, @mgautierfr I can test that ZIM on the PWA on Android and report back.

Regarding video not playing on the Android app / webview -- don't forget the issue I raised in #174, that there is some syntax used in the regular expressions in wombat_setup.js that is incompatible with some older browsers, and which at least blocks playback on iOS / Safari < 16.4, and Chrome for Android < 121 (which is very recent!). Specifically, I identified the use of lookbehinds (see issue).

Is it possible that the Android app's webview has some incompatibilities with the syntax? Do we know what version of Chromium the webview is based on?

Jaifroid commented 8 months ago

I confirm that both of the above linked ZIMs (https://tmp.kiwix.org/ci/test-warc/mes_quartiers_youtube_vimeo_fix.zim and https://tmp.kiwix.org/ci/test-warc/mes_quartiers_youtube_video_tag.zim) play perfectly in the PWA on Android! Yay! 😊

I also confirm that neither video works in my version of the official Android app. ☹️

Can anyone confirm whether Zimit-based video ever worked in the Android app? I have an old Raspberry PI Docs ZIM from 2023-01, and in the official Android app the video on the "Getting Started" page displays the same symptoms as the JS-based video playback in the "fix" ZIM above.

Jaifroid commented 8 months ago

I've tested those two ZIMs on iOS 17 on an iPad (not in official app, but using PWA). Neither video works, but the differences between the two might be instructive.

In the case of the JS player version (fix), the fuzzy matching rules in the screenshot are never hit with anything relevant:

image

(we get a single hit on a very short stats link, and the only thing that is transformed is that the contents of the querystring after the question mark are removed -- nothing to do with googlevideo or fuzzy) see UPDATE* at bottom of post

In the case of the HTML5 player version (tag), the player shows, but no video is loaded, and it can't be played. If I go to the direct ZIM link for the video, I get a play icon with a cross through it. In console log, the following is shown:

image

The plugin (presumably the HTML5 plugin? EDIT: this is direct play, not via HTML5 player) can't handle load of the video, at least from a ZIM URL. This could be a codec issue, though it seems really unlikely that standard MP4 video can't be played on an iPad... Or it could be to do with the internal workings of the plugin whatever internal plugin Safari uses to handle video.

The experiment needs to be repeated with Kiwix Serve, to confirm, but I'm not sure how to load these tmp files into a remote instance of Kiwix Serve.

UPDATE* I was hitting the bug with not being able to click. Once I managed to click, the fuzzy matching rule did indeed hit for the video URL and transformed it correctly. However, the player still can't play it, and shows following message:

image

Jaifroid commented 8 months ago

And finally a piece of good (-ish) news:

On this iOS 17 iPad, I was able to play the video in the case of the tag (HTML5) ZIM using jQuery mode, which goes and gets the video itself, turns it into a BLOB URL, and inserts that as the video src (look at line below highlighted one in console.log). This proves that it's not a codec issue, but instead an issue with how the URL is read and retrieved. It also means there is hope for a workaround...

image

kelson42 commented 6 months ago

@benoit74 I believe you have found the explnation needed here. Maybe it desserve a comment and then be closed?

benoit74 commented 6 months ago

It is explained here: https://github.com/openzim/zimit/issues/291

I forgot to close this issue as well.