Closed Jamoth closed 4 years ago
You're correct. Rhasspy waits until the entire voice command is recorded before starting transcription. This was originally done as a way to increase the accuracy of Pocketsphinx; it does better when it receives a full "utterance" upfront.
Now that Rhasspy has embraced Kaldi for STT, it's time to take advantage of its streaming capabilities. The ground work for this is being laid in the rhasspy-asr-kaldi service, which uses a C++ Python extension to communicate directly with Kaldi (rather than a shell script). A slight tweak to this script should allow for streaming audio, so expect this is the near future!
15 minutes ago I was reading Kaldi documentation regarding streaming features... And it's already here, awesome =)
Preliminary support for Kaldi streaming is now in the rhasspy-asr-kaldi library. Once I get this moved back into the main Rhasspy app, I'll let you know :)
Hello,
I recently tested if SEPIA and Rhasspy might fit my needs to replace snips in my home automation setup. Comparing the performance of kali STT, both use the same zamia i think, SEPIA is a lot faster. I asked Florian from SEPIA and he says:
From what I saw in the Rhasspy logs, it starts transcribing after the voice command is finished. Is it possible to improve Rhasspy that way?