kiwix / kiwix-js

Fully portable & lightweight ZIM reader in Javascript
https://www.kiwix.org/
GNU General Public License v3.0
295 stars 124 forks source link

Prevent unnecessary extraction of all subtitle languages for playback of media in the ZIM #445

Open Jaifroid opened 5 years ago

Jaifroid commented 5 years ago

As discussed in #441 , in jQuery mode at least, all text tracks (subtitles / closed captions) of a media block (audio / video) are currently extracted from the ZIM after the media source is loaded in the page. While this isn't necessarily perceptible to the user, some pages have up to 30 different subtitle files. These files are heavily compressed in the ZIM, and block the engine for some time after. This doesn't affect playback, although it will prevent selection of subtitles towards the end of the subtitle list until all the files are extracted. It is also wasteful of CPU / battery, and could cause overheating on lower-spec devices. Furthermore, because of #426 (redundant extraction of assets after navigation) these files will continue to be extracted even after the user has navigated to another page (e.g. if they decided not to watch a video).

There are two strategies currently under consideration to fix this:

  1. Construct a custom selector to allow the user to select the subtitle they wish. This has the advantage that we can fully control and monitor which events fire in this selector, but has the disadvantage that we alter the page HTML and layout slightly.
  2. Use the existing CC selector for those browser versions with widgets that allow selection of subtitles. This has the disadvantage that different browsers fire different events on selecting a new subtitle (see discussion in #441), and IE11 does not fire any event at all.

I shall attempt a solution using the second strategy first, with capability detection and fallback to extracting all subtitles.

Jaifroid commented 5 years ago

In Firefox 63.0.3 (running in WSL Linux) all the subtitles are extracted in SW mode as well (I've just checked by placing a breakpoint on the message channel of SW). In Chromium and Edge, they are not extracted in SW mode. However, in both those browsers, subtitles can't be selected in the interface, and the text tracks have all been removed from the video block (probably read into a video.js array).

If I disable the modification that supports JavaScript in SW mode, the video plays fine in SW mode, the CC menu is available in both Chromium and Edge, and subtitles can be selected. So I guess this confirms a bug in the included video.js? This was tested in the latest TEDx global issues ZIM.

kelson42 commented 4 years ago

@Jaifroid Since 10 months a lot of work has been invested in the support of videos in ZIM files. I believe that this ticket might benefit of a refresh based on a recent ZIM file. Would you be able to confirm that the problem still occurs please? If "yes", we should I believe try to identify where is the problem exactly (maybe we could fix something in the ZIM itself?). I also challenge a bit that this is a bug :) @rgaudin Do we have a good demonstration ZIM file with subtitles?

Jaifroid commented 4 years ago

@kelson42 Yes, I'd be happy to test.

rgaudin commented 4 years ago

@Jaifroid please test with newer ZIM files from youtube tag or ted category at https://farm.openzim.org/recipes

For youtube, we have an --all-subtitles option that is slow upon rendering, in-zim or not, because it includes auto-generated subtitles which are usually 100+.

We don't use that option in the ZIM we create though.

For TED, we have a flexible --subtitles option that can match the language request or include all available subtitles. That's what we do on the ZIM we create. It's not auto-generated but there are still a good number of them. I'd advise you test with latest TED Global Issue so you can compare with what you had.