k2-fsa / sherpa-onnx

Speech-to-text, text-to-speech, speaker recognition, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter, Object Pascal, Lazarus, Rust
https://k2-fsa.github.io/sherpa/onnx/index.html
Apache License 2.0
3.24k stars 380 forks source link

Blank results from online-websocket-client-microphone.py #487

Open OswaldoBornemann opened 9 months ago

OswaldoBornemann commented 9 months ago

I tried to use the python-api-examples/online-websocket-client-microphone.py when i started the sherpa-onnx-online-websocket-server already, but i got the blank results.

Started! Please speak
{"is_final":false, "segment":0, "start_time":0.00, "text": "", "timestamps": [], "tokens":[]}
{"is_final":true, "segment":0, "start_time":0.00, "text": "", "timestamps": [], "tokens":[]}
{"is_final":false, "segment":0, "start_time":2.56, "text": "", "timestamps": [], "tokens":[]}
{"is_final":true, "segment":0, "start_time":2.56, "text": "", "timestamps": [], "tokens":[]}
{"is_final":false, "segment":0, "start_time":5.12, "text": "", "timestamps": [], "tokens":[]}
{"is_final":true, "segment":0, "start_time":5.12, "text": "", "timestamps": [], "tokens":[]}
{"is_final":false, "segment":0, "start_time":7.68, "text": "", "timestamps": [], "tokens":[]}
{"is_final":true, "segment":0, "start_time":7.68, "text": "", "timestamps": [], "tokens":[]}
{"is_final":false, "segment":0, "start_time":10.24, "text": "", "timestamps": [], "tokens":[]}
{"is_final":true, "segment":0, "start_time":10.24, "text": "", "timestamps": [], "tokens":[]}
{"is_final":false, "segment":0, "start_time":12.80, "text": "", "timestamps": [], "tokens":[]}
csukuangfj commented 9 months ago

What did you say after starting python-api-examples/online-websocket-client-microphone.py and what do you expect from the returned result?

OswaldoBornemann commented 9 months ago

Sorry if my explanation was not very clear. What I meant is, when I launched the sherpa-onnx-online-websocket-server, I observed that this service was already up and running in the background. Then, I tried to initiate the client service using python-api-examples/online-websocket-client-microphone.py because I wanted to utilize my computer's recording capabilities for real-time speech recognition. However, when I spoke, I noticed that the client returned empty results.

csukuangfj commented 9 months ago

》 when I launched the sherpa-onnx-online-websocket-server

Which model are you using?

However, when I spoke

Did you speak English and the server is using an English model?

OswaldoBornemann commented 9 months ago

》 when I launched the sherpa-onnx-online-websocket-server

Which model are you using?

However, when I spoke

Did you speak English and the server is using an English model?

I used the model named sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12. And i speak Chinese as well.

OswaldoBornemann commented 9 months ago

Similarly, I also tried the same functionality in Python, following the guide at https://k2-fsa.github.io/sherpa/onnx/websocket/online-websocket.html#start-the-client-python-with-microphone.

It appeared to have started successfully, but when I spoke, there was no output of any kind.

Started! Please speak
/Users/runner/work/sherpa-onnx/sherpa-onnx/sherpa-onnx/csrc/features.cc:AcceptWaveformImpl:89 Creating a resampler:
   in_sample_rate: 48000
   output_sample_rate: 16000
csukuangfj commented 9 months ago

Please post the complete command about how you start the server.

csukuangfj commented 9 months ago

Also, please test it with https://github.com/k2-fsa/sherpa-onnx/blob/master/python-api-examples/speech-recognition-from-microphone-with-endpoint-detection.py which does not use a server or a client. It makes the debug easier.

OswaldoBornemann commented 9 months ago

Please post the complete command about how you start the server.

I see. So this is the command how I start the server.

(base) MacBook-Pro sherpa-onnx % python python-api-examples/speech-recognition-from-microphone.py \
--tokens=./sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/tokens.txt \
--encoder=./sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/encoder-epoch-20-avg-1-chunk-16-left-128.onnx \
--decoder=./sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/decoder-epoch-20-avg-1-chunk-16-left-128.onnx \
--joiner=./sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/joiner-epoch-20-avg-1-chunk-16-left-128.onnx
  0 DELL U2422HX, Core Audio (0 in, 2 out)
  1 iPhone Microphone, Core Audio (1 in, 0 out)
> 2 MacBook Pro Microphone, Core Audio (1 in, 0 out)
< 3 MacBook Pro Speakers, Core Audio (0 in, 2 out)
  4 Microsoft Teams Audio, Core Audio (2 in, 2 out)
Use default device: MacBook Pro Microphone
Started! Please speak
/Users/runner/work/sherpa-onnx/sherpa-onnx/sherpa-onnx/csrc/features.cc:AcceptWaveformImpl:89 Creating a resampler:
   in_sample_rate: 48000
   output_sample_rate: 16000
OswaldoBornemann commented 9 months ago

Also, please test it with https://github.com/k2-fsa/sherpa-onnx/blob/master/python-api-examples/speech-recognition-from-microphone-with-endpoint-detection.py which does not use a server or a client. It makes the debug easier.

Okay. I will give it a try.

OswaldoBornemann commented 9 months ago

Also, please test it with https://github.com/k2-fsa/sherpa-onnx/blob/master/python-api-examples/speech-recognition-from-microphone-with-endpoint-detection.py which does not use a server or a client. It makes the debug easier.

Okay. I will give it a try.

The result seems the same as speech-recognition-from-microphone.py. I think the problem might be the microphone or the streaming input. I will check it.

csukuangfj commented 9 months ago

Please post the complete command about how you start the server.

I see. So this is the command how I start the server.

(base) MacBook-Pro sherpa-onnx % python python-api-examples/speech-recognition-from-microphone.py \
--tokens=./sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/tokens.txt \
--encoder=./sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/encoder-epoch-20-avg-1-chunk-16-left-128.onnx \
--decoder=./sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/decoder-epoch-20-avg-1-chunk-16-left-128.onnx \
--joiner=./sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/joiner-epoch-20-avg-1-chunk-16-left-128.onnx
  0 DELL U2422HX, Core Audio (0 in, 2 out)
  1 iPhone Microphone, Core Audio (1 in, 0 out)
> 2 MacBook Pro Microphone, Core Audio (1 in, 0 out)
< 3 MacBook Pro Speakers, Core Audio (0 in, 2 out)
  4 Microsoft Teams Audio, Core Audio (2 in, 2 out)
Use default device: MacBook Pro Microphone
Started! Please speak
/Users/runner/work/sherpa-onnx/sherpa-onnx/sherpa-onnx/csrc/features.cc:AcceptWaveformImpl:89 Creating a resampler:
   in_sample_rate: 48000
   output_sample_rate: 16000

This command works perfectly on my side. Please check your microphone.

By the way, you can use it to decode files. If it works, then there must be issues with your microphone.

OswaldoBornemann commented 9 months ago

Please post the complete command about how you start the server.

I see. So this is the command how I start the server.

(base) MacBook-Pro sherpa-onnx % python python-api-examples/speech-recognition-from-microphone.py \
--tokens=./sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/tokens.txt \
--encoder=./sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/encoder-epoch-20-avg-1-chunk-16-left-128.onnx \
--decoder=./sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/decoder-epoch-20-avg-1-chunk-16-left-128.onnx \
--joiner=./sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/joiner-epoch-20-avg-1-chunk-16-left-128.onnx
  0 DELL U2422HX, Core Audio (0 in, 2 out)
  1 iPhone Microphone, Core Audio (1 in, 0 out)
> 2 MacBook Pro Microphone, Core Audio (1 in, 0 out)
< 3 MacBook Pro Speakers, Core Audio (0 in, 2 out)
  4 Microsoft Teams Audio, Core Audio (2 in, 2 out)
Use default device: MacBook Pro Microphone
Started! Please speak
/Users/runner/work/sherpa-onnx/sherpa-onnx/sherpa-onnx/csrc/features.cc:AcceptWaveformImpl:89 Creating a resampler:
   in_sample_rate: 48000
   output_sample_rate: 16000

This command works perfectly on my side. Please check your microphone.

By the way, you can use it to decode files. If it works, then there must be issues with your microphone.

Yeah I think so. I will check the microphone to see what happened.