r4victor / afaligner

📈 A forced aligner intended for synchronization of narrated text
MIT License
85 stars 11 forks source link

Progressively out of sync results as length of audios increase #11

Open versae opened 1 year ago

versae commented 1 year ago

I have been using afaligner to do syncing of Norwegian audiobooks, but as the lengths of the books in audio get longer (+6 hours), the misalignments get worse. So as you progress reading each page, the synchronization is pretty good at the beginning but it is barely usable towards the end of the booik.

I've seen instances of:

The problem also exists for example in aeneas (https://github.com/readbeyond/aeneas/issues/271, https://github.com/readbeyond/aeneas/issues/288), where a user suggests that changing the VAD threshold might help. Is that an option for afaligner. Maybe changing the TTS could also help (https://github.com/r4victor/afaligner/issues/10).

r4victor commented 1 year ago

@versae, do you mean the problems occurs when the overall duration of the book is >6 hours or when individual audio segments are that long? I never tested afaligner with such a long segments since it was designed to sync audiobooks split on a chapter-by-chapter basis, so don't expect that to work. But if you speak about the overall duration, it should not make much difference and syncing should work fine as long as each audio file is of reasonable length (say up to an hour). This may not be the case for non-English books. Maybe you can hardcode a different TTS and see if it works better for you.