Closed FilBot3 closed 2 years ago
Hi,
It looks like your file is in the 32 bit precision and the original model was assuming 16 bit precision. So you need to first transform it into 16bit using some command like sox what.wav -b 16 what_out.wav
And then it will give you some good results
$ python -m allosaurus.run -i what_out.wav
w ɒ tʲ
Thank you, I'll try this out. To be a nit, I didn't see that defined on the README.
yeah, I should mention it in the README, thanks for your reminder!
Interesting, with ffmpeg, I think I did it correctly, but got a different result.
ffmpeg results
versus my sox
results.
Thank you for showing me the way.
Also seems like it needs to be down sampled to at least 16000
kbps
~/Downloads❯ python -m allosaurus.run -i what_16bit.wav
w a tʰ
The issue
I am currently trying to use Allosaurus to help a Speech Language Pathologist perform transcriptions, but I am having issues with getting the application to recognize the word
what
let along longer WAV files with more complex sentences in them. Attached is the WAV file. The output I get from Allosaurus is:I even installed the
eng2102
model.It was recorded using a Tascam DR-40X using WAV 32bit then transferred over to a Pop!_OS Linux System.
Python Version
Pop!_OS Version
what.wav
file. what.wav.zipThe question
I feel like I'm not doing something correctly. Do I need to train allosaurus to listen for English sounds as well? I expect to see something similar to
wʌt