Slience in audio makes it not accurate

hey , it's a good project but i found a question when someone speak slowly , and there are many slience between two world (because the speaker is thinking) it will not match so much

like this：

the sentence is 今天天气不错我们来学习第一课 today is a nice day let's learn class one

but he read like 今天(slience 1s)天气不错我们(slience 1s)来(slience 1s)学习(slience 1s)第一课 today (1s) is a nice day let's (1s) learn (1s) class (1s) one

in this case the align is not accurate it becomes

srt: 今天天气不错 | 我们来学习第一课 ... voice: 今天 | 天气不错 ... srt: today is a nice day | let's learn class one ..... voice: today | is a nice day ....

It seems like the slience is too long , so it considered as complete sentence, but which is not

To test this problem , i cut all slience part , and retry ,it works well And it not appare in a short audio file, only show up when the audio is more than 10min or something

how can i tall aeneas to ignore the slience? because there is lot another work to cut slience down , and add up the time difference into srt at final

readbeyond / aeneas

Slience in audio makes it not accurate #265