kaegi / alass

"Automatic Language-Agnostic Subtitle Synchronization"
GNU General Public License v3.0
994 stars 52 forks source link

Read audio from video directly, without extracting #30

Closed skorokithakis closed 3 years ago

skorokithakis commented 3 years ago

Currently, ALASS extracts the audio from the video so it can process it. This takes most of the time currently. If ALASS could read the audio directly from the video, it would presumably be at least twice as fast.

I imagine there are some video libraries that can be used to do this, but I don't know of any.

kaegi commented 3 years ago

This is more complicated than it seems. "Extracting" might be a misnomer here. This "extraction" step is actually "reading from video/audio file AND decompressing the compressed MP3/AAC/Vorbis stream into raw mono-channel 8khz". This uncompressed mono-channel 8khz audio is needed for the voice-activity detection module. The actual bytes in the video/audio file can not be used directly, since they are not in this specific format.

Currently this reading, decompressing and converting is done by invoking (the highly optimized!) ffmpeg, which conveniently also supports practically all container and audio formats. The invokation is done by spawning ffmpeg as a sub-process and communicating directly via STDOUT. So no audio file is actually extracted/written to anywhere on the disk. It is also possible to link against ffmpeg and use it as a library. I have already done this (see the README section), but there are some legal issues with this as well as being slightly more effort to do it correctly. It also makes the compilation of the project much more cumbersome. As said in the README, this is a little bit faster but not by much.

If there is another way to do the reading, decompressing and conversion and save time, feel free to reopen the issue!

skorokithakis commented 3 years ago

Ah okay, that makes sense, thanks!