qgustavor / mkv-extract

Extract MKV files online, directly from your browser
https://qgustavor.github.io/mkv-extract/
MIT License
91 stars 10 forks source link

Handle image subtitles #17

Closed ethicnology closed 2 years ago

ethicnology commented 2 years ago

Hi, i tried your tool to extract some subtitles from a mkv file, i've got this kind of unreadable output: Capture d’écran du 2021-11-19 12-14-04

qgustavor commented 2 years ago

This project was pretty much not maintained (except for minor changes such as library updates) for two years. IIRC the hard part is done by node-ebml which decodes the binary structure to things simpler such as strings and numbers, so something so corrupted as that is weird.

I quickly checked here and probably the issue is here: it might be returning more data than needed. The number of bytes to read is read using ebml.tools.readVint so I think the line that handles this is fine.

Anyway, this project is not maintained anymore because this is, in fact, an abandoned idea: I was trying to extract videos from MKV files and merge those into MP4 streams (remuxing) using JavaScript so those can be watched on browsers. I started it small by just extracting tracks but turned that the rest was harder than what I expected and I stopped this project five years ago. Two years ago I rewrote the code in TypeScript to try to reduce the number of bugs, but the code is still messy. If you have time to help fixing this issue then send a pull request.

ethicnology commented 2 years ago

Hi, thanks for this detailed answer. After trying multiple tools, i finally discover that input subtitles are image based that's why the output is weird. I'm going to follow this stack exchange answer.

qgustavor commented 2 years ago

Subtitle Edit can OCR image subtitles and convert it to other formats, you can even automate it, I use it in a script of mine, is easy. I will reopen this issue as this tool should not extract image based subtitles (like DVB, DVD, PGS, there are many) as if they are SRT.

I will not fix this issue, I will only let it open so in case in the future I remember fixing it or in case someone sends a pull request fixing it. I don't really know to fix it, skipping those subtitle tracks would be the lazy solution but would be better adding some code to handle at least some popular image subtitle formats, such as DVD and PGS subtitles.