Closed Pita closed 3 years ago
Hi,
Thanks for the question! What do you mean by the fluctuation? Are you using the current model for your evaluation or you are trying to fine-tune it?
I mean there are a lot of phonemes incorrectly detected. Seems to be especially consonants. Was trying to detect the English th sound. It's almost impossible. I'm using the built in model
Attached is an example file. It was generated by a google syntethic voice. It only recognizes on th sound ( θ ), but there should be 3. With my own recordings I could never reproduce a θ sound
The model was trained by mixing many languages and many recording environments, so I would not be surprised if it fails to recognize a particular sound in a particular language.
We will release a couple of new models trained specifically on each of the major language including English (hopefully next month), so maybe you can try that model once released. That model should significantly increase the English accuracy.
For the current model, if you expand the topk candidates as mentioned in the README, it might give you some phones you want to get `$ python -m allosaurus.run --lang=eng -i google-th.wav --topk=5
t (0.339) θ (0.197) b (0.138) ð (0.074) s (0.058) | a (0.917)
Hello @xinjli, did you already release a new model?
Not yet, we hope to release the model this month
The new model was released, hope it will be helpful :)
Hello,
We're trying to evaluate allosaurus for a pronunciation trainer. But currently the results fluctuate a bit too much for it to be reliable. Is there any tips that you have to get more consistent results? How was the training data recorded and was it processed in some way (compressor, noise reduction, etc...)? With this information we could adjust our input data and might get better results.
Peter