streaming audio transcription with kaldi

synesthesiam / rhasspy

Rhasspy voice assistant for offline home automation

https://rhasspy.readthedocs.io

MIT License

947 stars 101 forks source link

streaming audio transcription with kaldi #122

Closed Jamoth closed 4 years ago

Jamoth commented 4 years ago

Hello,

I recently tested if SEPIA and Rhasspy might fit my needs to replace snips in my home automation setup. Comparing the performance of kali STT, both use the same zamia i think, SEPIA is a lot faster. I asked Florian from SEPIA and he says:

The SEPIA STT server is doing streaming audio transcription meaning the audio file is already transcribed while the user is still speaking.

From what I saw in the Rhasspy logs, it starts transcribing after the voice command is finished. Is it possible to improve Rhasspy that way?

synesthesiam commented 4 years ago

You're correct. Rhasspy waits until the entire voice command is recorded before starting transcription. This was originally done as a way to increase the accuracy of Pocketsphinx; it does better when it receives a full "utterance" upfront.

Now that Rhasspy has embraced Kaldi for STT, it's time to take advantage of its streaming capabilities. The ground work for this is being laid in the rhasspy-asr-kaldi service, which uses a C++ Python extension to communicate directly with Kaldi (rather than a shell script). A slight tweak to this script should allow for streaming audio, so expect this is the near future!

frkos commented 4 years ago

15 minutes ago I was reading Kaldi documentation regarding streaming features... And it's already here, awesome =)

synesthesiam commented 4 years ago

Preliminary support for Kaldi streaming is now in the rhasspy-asr-kaldi library. Once I get this moved back into the main Rhasspy app, I'll let you know :)