respeaker / respeakerd

respeakerd is the server application for the microphone array solutions of SEEED, based on librespeaker which combines the audio front-end processing algorithms.
MIT License
54 stars 24 forks source link

Replace AVS with custom ASR service #20

Open sskorol opened 4 years ago

sskorol commented 4 years ago

Hi @KillingJacky,

In the dev manual you mentioned:

It's also a good example showing how to utilize the librespeaker. Users can implement their own server application / daemon to invoke librespeaker.

Is there any reference on how to use respeakerd w/o AVS? I just want to apply DSP algorithms (AGC, NS, AEC, etc.) to the input audio stream captured from Respeaker Core V2, and redirect the filtered audio as a byte array via web sockets to my ASR server. Is there any similar example? Or maybe you can provide a short description of what should be changed in the existing code to support such a scenario?

P.S. I saw python client in a separate repo. But it doesn't use any DSP.

Would be greatly appreciated any help.

sskorol commented 4 years ago

@fanjm95, @jerryyip maybe you have some thoughts folks?

spidey99 commented 3 years ago

Bump!

I'm trying to create an always listening device, so circumventing the wake-word mentality, and want to pass the audio down stream for processing. I'm having a heck of a time peeling back the layers. I'm looking for an example similar to above.

sskorol commented 3 years ago

@spidey99 seems like this repo is dead and not maintained anymore. Moreover, main contributors don't answer even to emails. I didn't find any help on official forum as well. Unfortunately, they flushed such a perspective idea down the toilet.

I spent a lot of time poking around these repos and their dependencies. Finally, I decided to avoid wasting time on this particular project anymore. Actually, I believe the entire idea of re-using Respeaker Core hardware with AVS is a dead-end, as it makes no sense to buy a $99 board to get another Alexa (assuming Echo Dot is much cheaper, especially on Black Friday).

For me, Seeed Studio had to concentrate on a software part that allows developers all over the world to easily connect their own SST/TTS services. It would make more sense for people who are willing to make an offline ASR solution based on languages that aren't supported by Amazon or Google. That's why I decided to focus my effort on extending librespeaker samples.

Now I have a working prototype, which can stream audio chunks to custom WebSocket ASR server. Technically, there are 2 transports implemented in this repo: WS and MQTT. So we can send audio data the way we want.

However, I'm not a C++ developer. My primary language is Java/TS. So there are still lots of things I want to improve. Unfortunately, can't do it right now due to a lack of C++ expertise. So if you have any ideas or suggestions, PRs are always welcome. I hope there will be more people who want to resurrect and improve this idea. As it's really hard to do it alone.

songtaoshi commented 3 years ago

Maybe I am late and not quite understanding the context, but I think you can just use pyaudio get the stream and push it into your ASR service.

sskorol commented 3 years ago

@songtaoshi if you just get the stream from pyaudio, there won't be any DSP algorithms applied at all. It makes no sense to send a raw audio stream to ASR w/o preprocessing. This board's value is only in DSP (NS, BF, AEC, etc.) that could be achieved only programmatically via librespeaker. I don't believe anyone wants to use a $99 hardware just as a usb mic array. There are much cheaper alternatives for this.