readbeyond / aeneas

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
http://www.readbeyond.it/aeneas/
GNU Affero General Public License v3.0
2.44k stars 218 forks source link

Sync problems with large files #288

Open pombotron opened 2 years ago

pombotron commented 2 years ago

I'm having an issue where my subs are perfect up until a point and then suddenly one of the sub lines will last a few seconds too long and then from there on the whole thing just goes more and more out of sync. The point at which it goes out of sync varies from file to file but it alwasy happens and generally from around 26-40 mins in from what I've observed. It also seems to happen sooner the longer the file is. Here is a dump of the output of the program on a job where this issue is present. dump.txt

My machine has 64GB of RAM and the file I'm currently trying to work on is 4 hours long.

Some of the stuff I've tried to see if it makes any difference is: Various different versions of Ubuntu and Python, Using the es-la language code instead of es Using espeak-ng wrapper instead of espeak wrapper Replacing all accented characters in the text with standard English ones.

But none of this has made any difference so far.

Has anyone else experienced an issue like this or have any guesses on how it might be fixed?

ozdefir commented 2 years ago

You can try doubling the dtw_margin runtime parameter. Just add this before the output filename in the command line: -r="dtw_margin=120" It will double the RAM usage but your 64 GB should be enough.

pombotron commented 2 years ago

You can try doubling the dtw_margin runtime parameter. Just add this before the output filename in the command line: -r="dtw_margin=120" It will double the RAM usage but your 64 GB should be enough.

Thanks very much for the suggestion. I'll try this when I get home from work and let you know the results.

pombotron commented 2 years ago

You can try doubling the dtw_margin runtime parameter. Just add this before the output filename in the command line: -r="dtw_margin=120" It will double the RAM usage but your 64 GB should be enough.

Changing this setting to 120 made the time before desync increase to 1:51:00 and then I was able to get the full 4 hour clip done successfully by increasing it further to 180 so thank you very much for the help. Do you know any other settings off the top of your head that may be relevant for this issue? Just thinking that for even longer files I might still run into issues as wont be able to crank it too high without running out of memory.

ozdefir commented 2 years ago

If you want to reduce the memory usage without sacrificing the quality much, you can increase the window_shift. Like so: -r="dtw_margin=180|window_shift=0.060"

Setting it to 0.060 would cut down the memory usage by half. I wouldn't go above 0.100 though, as that seems to be the point where the quality begins to decline substantially.