qgustavor / mkv-extract

Extract MKV files online, directly from your browser
https://qgustavor.github.io/mkv-extract/
MIT License
95 stars 10 forks source link

Use original mkvextract code #21

Closed qgustavor closed 1 year ago

qgustavor commented 2 years ago

Using mkvextract code - by compiling it to WASM or something like this - would allow to easily support many features that it supports and the current project don't, like extracting video data, audio data, image subtitles, chapters and tags, also supporting corrupted files.

By opening this issue I want to make clear that I'm not intended to implement those features using the current codebase. This project started as an experiment as I wanted to demux MKV files and mux those in MP4 files in browsers so it would allow MKV files to be played in any browser that already supports MP4 files, as long the browser also supports the codecs used. I only managed to implement the subtitle and attachment extraction code, which ended being this tool.

Now, years later, I don't want to mess up with the demux-mux idea since, IIRC, there are already tools that do that and browsers that support playing MKV files. In the other hand I already started this project and some people (myself included) use it, so it makes sense supporting this project by adding the missing features and fixing issues. Since the objective now is just provide a browser version of mkvextract then I think the best idea is just using the original code by compiling it to WASM and adding an easy to understand GUI.

qgustavor commented 2 years ago

One way to improve the existing code is using this library which implements everything from here (except by the GUI) better: https://github.com/mathiasvr/matroska-subtitles

Is it worth? I think it is: while it would not add support to extracting anything new - like audio tracks or video tracks - it would add support to compressed subtitles and support corrupted files.

Edit: this library does not implement conversion from MKV to SSA/ASS files, just parsing. There are a related library that does that but only with SRT subtitles. That's not a drop in replacement and a lot of code would need to be rewritten, so using is not worth.

Edit 2: I've found a file that the current code cannot extract because there is something unexpected in the file - probably the order of the blocks are not the expected, I don't know - and using this library would help with that. In the other hand implementing MKV to SSA conversion (and all missing mkvextract features, like setting up a file name template) would be still needed thus using the original mkvextract code is better.

qgustavor commented 1 year ago

An alternative solution: using FFmpeg. There are already JavaScript builds of it but with only WebM and MP4 support, The idea would be taking that code and modifying it to remove encoders and decoders to save space and keep only muxers, including a Matroska demuxer. It would not only allow extracting more streams (including audio and video streams) but allow it extract data from MP4 files too.

I tried doing that, but even without modifying anything in the code, the build failed. Maybe that's something weird with my environment (WSL) I will try again with another environment later.

qgustavor commented 1 year ago

Looks like it's not building because it's not compatible with latest Emscripten. I tried compiling from this fork that supports latest Emscripten and, without any modifications, it worked. Then I modified the mp4 build to make it useful for this project and it worked well.

The only issue is that there`s no good way to stream outputs because this Emscripten issue, so for now it would be only suitable for subtitle tracks and attachments, not video or audio tracks. It would be still useful because it would allow demuxing MP4 files (not just MKV files), it would handle compressed tracks and also handle corrupted files.

I fill fork davedoesdev's repository and publish the changes I made to the library, then start working on a new GUI using FFmpeg as a backend trying to reproduce most of gMKVExtractGUI features if possible.

Edit: Maybe it will be still useful to extract video and audio tracks, as long they are not too long and, thus, fit in memory. Because it uses WORKERFS the original video is not stored in memory. The only bad thing is that it would require more muxers: mov muxer works for codecs that can be stored in M4A files such as AAC and AC3, then either the matroska muxer could be used to extract those tracks into mka files (which would be a bit weird since it's matroska to matroska) or it would be required to add more muxers, like a MP3 and FLAC muxer.

Edit 2: Adding FLAC and MP3 muxers only added 30 KB to the output, which is a 52 KB JS file and a 1600 KB WASM file. Acceptable. Keeping the Matroska muxer is good for the MKA fallback and it would allow using this build to remove tracks from a MKV file, adding some extra functionality from mkvtoolnix that's not present in gMKVExtractGUI.

qgustavor commented 1 year ago

New version released.