kaegi / alass

"Automatic Language-Agnostic Subtitle Synchronization"
GNU General Public License v3.0
1.03k stars 52 forks source link

Troubleshooting wrong alignment #7

Open davidde opened 4 years ago

davidde commented 4 years ago

I was wondering how the language-agnostic part works, since on my first few quick tests, it generated a totally wrong output for Dutch subtitles, but a perfect one for English subs. The dutch output had the first 5 subtitles all starting at 00:00:00,000, and then obviously all succeeding subtitles were way too early compared to the audio.

I guess this could still be caused by some other variable than language, since I just tested 2 files. Which makes me wonder; is there a --verbose switch or anything that can help me debug this? How do you recommend approaching this issue?

Really great project btw, and thumbs up on MPV! It is also my main media player on Linux ;)

kaegi commented 4 years ago

This means that the dutch subtitle is "more different" to the result of the voice-activity-detection than the english subtitle (much more extra/missing lines???). It would be interesting to know which movie you use. Movies with more action scenes and lound background music have a higher chance of failure than "quiet" ones.

If you know that the framerate is correct in the original subtitle file, you could try --disable-fps-guessing. This is usually the step that goes wrong.

In this case you can use a trick by aligning the wrong dutch subtitle to the corrected english subtitle (without any other flags). This has a very high chance of success.

There is no --verbose flag or anything. If the framerate guessing is indeed the step that went wrong, printing the scores for the 7 tested framerate ratios might provide some insight (giving the confidence of the guess). I don't think there is any other usable information for a human.

kaegi commented 4 years ago

The information of block of XXX subtitles shifted by XXX gives an impression on how many splits the algorithm does and how far they are placed.

kaegi commented 4 years ago

And I forgot: There is a special mode to debug the voice activity detection by using underscore!

alass Movie.mp4 _ voiceactivity.srt

This generates a subtitle containing the timespans of where speech is likely. It is usually not that accurate (given music or background noise), but there should be enough lines that correspond to valid dialog.

davidde commented 4 years ago

Ok, thanks for the pointers. The cases I mentioned were not generated from the same movie. I've now tried synchronizing Dutch subtitles of the movie that generated the perfect English output, and it also generated a bad Dutch output. So that at least seems to suggest it is less reliable for non-English subs, which is weird since voice detection should be no different for non-English.

When I based the Dutch output on English reference subs, output was much better, though not completely flawless. If I can find the time, I might do some more testing with more subs/languages to see if I can narrow the problem down.

kaegi commented 4 years ago

Another thing you should try is --split-penalty for values like 1,2,5,10,20,30 or 50. It might be that the split penalty is too low (or too high; default is 7 which is rather low). Your case seems very strange, I hope you can find the problem.

Using the voiceactivity.srt is equivalent with using the movie audio. It just skips the extraction step, so you can play around faster with the values.

davidde commented 4 years ago

Great, thanks for the help. I'll see if I can get better matching subs.