readbeyond / aeneas

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
http://www.readbeyond.it/aeneas/
GNU Affero General Public License v3.0
2.49k stars 228 forks source link

Is there a way to add vocabulary? #283

Open RomanADavis opened 2 years ago

RomanADavis commented 2 years ago

I have a project that involves doing forced alignment on the bible; is there a way to add biblical vocabulary to your model?

chrisvaughn commented 2 years ago

Hi Roman. I'm using Aeneas to do forced alignment of Bibles for YouVersion. There is no model in aeneas. It works differently than other forced aligners, like CMU Sphinx. If the text and audio are identical and there is a decent TTS engine for the language you should be able to get good results. This doc is a good read if you haven't looked at it yet. https://github.com/readbeyond/aeneas/blob/master/wiki/HOWITWORKS.md

Also, SIL's Scripture App Builder has support for using Aeneas. They have docs about how they use it, which is helpful to look at. https://software.sil.org/downloads/r/scriptureappbuilder/Scripture-App-Builder-07-Using-aeneas-for-Audio-Text-Synchronization.pdf