readbeyond / aeneas

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
http://www.readbeyond.it/aeneas/
GNU Affero General Public License v3.0
2.53k stars 233 forks source link

Slience in audio makes it not accurate #265

Closed sunjianan9900 closed 3 years ago

sunjianan9900 commented 3 years ago

hey , it's a good project but i found a question when someone speak slowly , and there are many slience between two world (because the speaker is thinking) it will not match so much

like this:

the sentence is 今天天气不错 我们来学习第一课 today is a nice day let's learn class one

but he read like 今天(slience 1s)天气不错 我们(slience 1s)来(slience 1s)学习(slience 1s)第一课 today (1s) is a nice day let's (1s) learn (1s) class (1s) one

in this case the align is not accurate it becomes

srt: 今天天气不错 | 我们来学习第一课 ... voice: 今天 | 天气不错 ... srt: today is a nice day | let's learn class one ..... voice: today | is a nice day ....

It seems like the slience is too long , so it considered as complete sentence, but which is not

To test this problem , i cut all slience part , and retry ,it works well And it not appare in a short audio file, only show up when the audio is more than 10min or something

how can i tall aeneas to ignore the slience? because there is lot another work to cut slience down , and add up the time difference into srt at final

readbeyond commented 3 years ago

Try with https://www.readbeyond.it/aeneas/docs/runtimeconfiguration.html#aeneas.runtimeconfiguration.RuntimeConfiguration.MFCC_MASK_NONSPEECH