readbeyond / aeneas

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
http://www.readbeyond.it/aeneas/
GNU Affero General Public License v3.0
2.49k stars 228 forks source link

Alignment at word level? #182

Closed pietrop closed 7 years ago

pietrop commented 7 years ago

First of all congrats on such an awesome project @readbeyond!

Was reading through the How it works section, and was wondering if you think it be possible to get alignment at word with word level timings using the system you describe?

Similar to the output of Gentle forced aligner (repo), who however uses automated transcription as an intermediate step to generate the aligner.

I am working on autoEdit.io an open source text based audio/video editing system. Some users use it to make captions, and was looking into integrating with Aeneas as an option to provide a better more usable srt, as well as allow to edit text freely and then re-align.

readbeyond commented 7 years ago

Hi,

Alberto here, the main dev of aeneas (and previously ReadBeyond).

Thank you for your interest in aeneas. You might want to read this paragraph:

https://github.com/readbeyond/aeneas/#a-note-on-word-level-alignment

and the note "If you are interested in synchronizing at word granularity" of

https://www.readbeyond.it/aeneas/docs/clitutorial.html

There are also a few threads on the subject in the aeneas mailing list:

https://groups.google.com/d/forum/aeneas-forced-alignment

In short, the end user decides the granularity of the synchronization and they can opt for word-level granularity. However, aeneas was designed for CC/phrase/sentence level granularity, and the algorithmic approach behind aeneas (i.e., MFCC + DTW) is not as robust as other aligners for word-level (HMMs/GMMs, neural nets). aeneas offers a few parameters to improve word-level sync, but I believe the limitations are pretty much intrinsic in the MFCC+DTW algorithm.

There is a tradeoff: aeneas was designed to work on many languages, without (heavy) language models, and easy to install/run. Other aligners come from academia (= state of the art) or are commercial products that are heavily engineered usually as by-products of speech recognition systems.

Thank you for the pointer about autoEdit.io --- I will have a look as soon as possible.

Alberto Pettarin

pietrop commented 7 years ago

Thanks for the prompt reply Alberto, that's awesome, will check out all the links, I am already getting ideas on how I could integrate with autoEdit, will let you know if I have any further questions.

readbeyond commented 7 years ago

You are welcome. If you have further questions that might benefit others (i.e., not confidential or specific), please feel free to post them on the aeneas mailing list. Otherwise shoot an email to my private address.

Also, you might want to check the aeneas Web app ( https://aeneasweb.org ).

Best regards,

Alberto Pettarin