hey , it's a good project
but i found a question
when someone speak slowly , and there are many slience between two world (because the speaker is thinking)
it will not match so much
like this:
the sentence is
今天天气不错
我们来学习第一课
today is a nice daylet's learn class one
but he read like
今天(slience 1s)天气不错
我们(slience 1s)来(slience 1s)学习(slience 1s)第一课
today (1s) is a nice daylet's (1s) learn (1s) class (1s) one
in this case
the align is not accurate
it becomes
srt: 今天天气不错 | 我们来学习第一课 ...
voice: 今天 | 天气不错 ...
srt: today is a nice day | let's learn class one .....
voice: today | is a nice day ....
It seems like the slience is too long , so it considered as complete sentence, but which is not
To test this problem , i cut all slience part , and retry ,it works well
And it not appare in a short audio file, only show up when the audio is more than 10min or something
how can i tall aeneas to ignore the slience?
because there is lot another work to cut slience down , and add up the time difference into srt at final
hey , it's a good project but i found a question when someone speak slowly , and there are many slience between two world (because the speaker is thinking) it will not match so much
like this:
the sentence is 今天天气不错 我们来学习第一课
today is a nice day
let's learn class one
but he read like 今天(slience 1s)天气不错 我们(slience 1s)来(slience 1s)学习(slience 1s)第一课
today (1s) is a nice day
let's (1s) learn (1s) class (1s) one
in this case the align is not accurate it becomes
srt: 今天天气不错 | 我们来学习第一课 ... voice: 今天 | 天气不错 ... srt: today is a nice day | let's learn class one ..... voice: today | is a nice day ....
It seems like the slience is too long , so it considered as complete sentence, but which is not
To test this problem , i cut all slience part , and retry ,it works well And it not appare in a short audio file, only show up when the audio is more than 10min or something
how can i tall aeneas to ignore the slience? because there is lot another work to cut slience down , and add up the time difference into srt at final