xinjli / allosaurus

Allosaurus is a pretrained universal phone recognizer for more than 2000 languages
GNU General Public License v3.0
550 stars 86 forks source link

Progress Information possible #30

Open steveway opened 3 years ago

steveway commented 3 years ago

Hi, So this is working quite well now in Papagayo-NG. But I wanted to know if it is possible to get progress information while it is recognizing. Because if the input files are larger it could take a while. If not then I will likely test slicing the input files into smaller segments based on silence gaps if possible and running them in series. So I can then show an approximate progress status. But the slicing might likely change the result of the recognizer.

xinjli commented 3 years ago

the most time-consuming part is inside the pytorch's forward logic, I think there is no feature allowing you to look into the progress.

However, I think you can have a very good estimate by looking at the audio length. The inference time is proportional to the audio length almost exactly. if you know how much time it costs for 1 sec audio, then you can guess the progress for much longer audio based on its length.

steveway commented 3 years ago

I see, I've added some logic to our program to do exactly that. I need to test some longer files to see how accurate this is. This way the progress is not as accurate but the result will not be changed by splitting the input.

Do you have some experience on how allosaurus behaves on long files actually?

xinjli commented 3 years ago

I tend to split the audio if they are longer than 30 sec. It probably would have some issues when audio file is too large, because the inference will use lots of memory. I think you can do the same by detecting the silence as people would not continue to speak forever. You can use voice activity detection to split by the silence. It typically would not affect the performance.