Open Kaljurand opened 9 years ago
@Kaljurand Can you test the implementation?
Thanks for the quick implementation!
I got it working after changing the headers-line to:
headers = {'Content-Type': 'audio/x-raw-int; rate=%s' % frame_rate}
I didn't test all the error condition handling though.
(Also, had to comment out "import mad" in client/plugin.py because this dependency was not installed as part of the requirements.)
In case you want to test it with an online server then you can use the URL http://bark.phon.ioc.ee/english/speech-api/v1/recognize. It's meant to be a demo (e.g. the recognition models are not very accurate), i.e. it should not be used as the default setting.
Ideally the plugin should be based on the WebSocket interface. This would allow you to stream the audio to the server already while the user is speaking and also start to immediately process the transcription. This would make the whole interaction snappier. I guess some of the other STT plugins would profit from such a streaming mode as well. (But that's a separate issue.)
Ideally the plugin should be based on the WebSocket interface. This would allow you to stream the audio to the server already while the user is speaking and also start to immediately process the transcription. This would make the whole interaction snappier. I guess some of the other STT plugins would profit from such a streaming mode as well. (But that's a separate issue.)
That'd need some major changes to the way recording and transcription is handled right now. It's a good idea, but this needs some thinking.
Add an STT engine based on https://github.com/alumae/kaldi-gstreamer-server which offers an HTTP interface (very similar to GoogleSTT) and a WebSocket interface.