k2-fsa / sherpa

Speech-to-text server framework with next-gen Kaldi
https://k2-fsa.github.io/sherpa
Apache License 2.0
521 stars 105 forks source link

Started server/client but not working properly #347

Closed treya-lin closed 1 year ago

treya-lin commented 1 year ago

Hi, so I just trained a streaming model using 100h of librispeech data with icefall recipe of pruned_transducer_stateless4, and now I am trying to use this model to learn the usage of sherpa. But after I started the server and client, I am unable to use the functions of the webpage like "upload"/"streaming-record"/"offline-record". Mind taking a look at what I might have done wrong?

I followed the instructions of these two docs to start the server end (port 6006) and client end(port 6008):

The code I used to start the server on port 6006 of the server 192.***.*.***

./sherpa/bin/streaming_pruned_transducer_statelessX/streaming_server.py \
  --endpoint.rule3.min-utterance-length 1000.0 \
  --port 6006 \
  --max-batch-size 50 \
  --max-wait-ms 5 \
  --nn-pool-size 1 \
  --nn-model-filename ../icefall/egs/my_models/librispeech/pruned_transducer_stateless4/exp/cpu_jit.pt \
  --bpe-model-filename ../icefall/egs/my_models/librispeech/data/lang_bpe_500/bpe.model 

The log goes like

2023-03-22 20:08:06,271 INFO [streaming_server.py:294] Using device: cuda:0
2023-03-22 20:08:08,853 INFO [streaming_server.py:377] Warmup start
2023-03-22 20:08:10,543 INFO [streaming_server.py:391] Warmup done
2023-03-22 20:08:10,543 INFO [streaming_server.py:486] No certificate provided
2023-03-22 20:08:10,546 INFO [server.py:713] server listening on [::]:6006
2023-03-22 20:08:10,546 INFO [server.py:713] server listening on 0.0.0.0:6006
2023-03-22 20:08:10,550 INFO [streaming_server.py:503] Please visit one of the following addresses:

  http://0.0.0.0:6006
  http://localhost:6006
  http://127.0.0.1:6006
  http://192.***.*.***:6006

Then I started the client end on port 6008 with commands: (Is it the correct way to specify the server ip?)

cd ./sherpa/bin/web
python -m http.server --bind 192.***.*.*** 6008

The log goes like

Serving HTTP on 192.***.*.*** port 6008 (http://192.168.4.169:6008/) ...
172.16.11.180 - - [22/Mar/2023 20:12:02] "GET /offline_record.html HTTP/1.1" 304 -
172.16.11.180 - - [22/Mar/2023 20:12:04] "GET /offline_record.html HTTP/1.1" 304 -
172.16.11.180 - - [22/Mar/2023 20:12:07] "GET /js/streaming_record.js HTTP/1.1" 304 -

So now if I open this client page of 192.***.*.***:6008 and the UI stuff seems nice:

  1. the home page image

  2. the streaming-record page image I click on the "click me to connect" button but nothing changed except a "!" appears at the end of the string of the button image

Any idea what is going wrong? I am doing all these on a remote server instead of my own laptop. Is it relevant to this problem and how should I make it work? Thanks a lot!

csukuangfj commented 1 year ago

To debug the issue, please right click on the page within your browser and click inspect in the pop-up menu, then switch to the console tab to view error logs.


There are two possible fixes:

  1. Please create a ssh tunnel from your local computer to your remote server so that you can use localhost to access the service. The reason is that you must, must, must use localhost to access your microphone within your browser if you use http. I think we have documented it somewhere.

  2. Please use https to replace http. Please refer to our doc for how to use https. In this case, you can use the IP of the server to access the microphone within your browser.

csukuangfj commented 1 year ago

please refer to https://k2-fsa.github.io/sherpa/python/streaming_asr/secure-connections.html for how to use https.

also, please refer to https://k2-fsa.github.io/sherpa/python/streaming_asr/conformer/conformer_rnnt_for_English/client.html#usage for how to debug Screenshot 2023-03-22 at 22 41 17

treya-lin commented 1 year ago

To debug the issue, please right click on the page within your browser and click inspect in the pop-up menu, then switch to the console tab to view error logs.

There are two possible fixes:

  1. Please create a ssh tunnel from your local computer to your remote server so that you can use localhost to access the service. The reason is that you must, must, must use localhost to access your microphone within your browser if you use http. I think we have documented it somewhere.
  2. Please use https to replace http. Please refer to our doc for how to use https. In this case, you can use the IP of the server to access the microphone within your browser.

Hi Thanks for the reply. So first I am trying to go with the second solution (replacing http with https). I followed the instruction of https://k2-fsa.github.io/sherpa/python/streaming_asr/secure-connections.html got some other errors. Mind taking a look?

Am I getting it right? So now I am using port 6007 for https server, and 6006 for websocket server. And now I can open the corresponding https link (https://192.***.*.***:6007/), change the port content to "6006" and get it connect to the websocket end.

The streaming-record mode seems to be working. (Is it normal that it got disconnected after I click streaming-stop?)

But for upload mode and offline-record mode, it seems the audio is sent but got disconnected right after. Here is what it returns and the console log. (I just pick a random wav file that I converted from librispeech data earlier. should not be a problem of the file? The offline-record mode got the same issue too)

image

treya-lin commented 1 year ago

In case the snapshot is not clear enough, the log on concole is pasted here:

protocol:  wss://
upload.js:26 server_ip:  192.***.*.***
upload.js:27 server_port:  6006
upload.js:31 uri wss://192.***.*.***:6006
upload.js:36 connected
upload.js:90 files: [object FileList]
upload.js:93 File {name: '700-122867-0000.wav', lastModified: 1569393760508, lastModifiedDate: Wed Sep 25 2019 14:42:40 GMT+0800 (中国标准时间), webkitRelativePath: '', size: 82638, …}
upload.js:94 file.name 700-122867-0000.wav
upload.js:95 file.type audio/wav
upload.js:96 file.size 82638
upload.js:100 reading file!
upload.js:107 num_samples 41297
upload.js:122 buf length, 165188
upload.js:52 Received message:  {"method": "greedy_search", "segment": 0, "frame_offset": 32, "text": "", "tokens": [], "timestamps": [], "final": false}
upload.js:56 Sent Done
upload.js:44 disconnected
csukuangfj commented 1 year ago

The streaming-record mode seems to be working. (Is it normal that it got disconnected after I click streaming-stop?)

Yes, it is normal and is also expected. It will disconnect after clicking streaming stop.


But for upload mode and offline-record mode, it seems the audio is sent but got disconnected right after.

That is expected. offline-record only works with non-streaming models.

offline is also known as non-streaming and online is known as streaming. You cannot mix with them.

csukuangfj commented 1 year ago

By the way, the latest master can make https and ws share the same port.

Please use the following option https://github.com/k2-fsa/sherpa/blob/5b083605a895bf289b49a92b388304b2e86523e3/sherpa/bin/streaming_pruned_transducer_statelessX/streaming_server.py#L201-L209

treya-lin commented 1 year ago

The streaming-record mode seems to be working. (Is it normal that it got disconnected after I click streaming-stop?)

Yes, it is normal and is also expected. It will disconnect after clicking streaming stop.

But for upload mode and offline-record mode, it seems the audio is sent but got disconnected right after.

That is expected. offline-record only works with non-streaming models.

offline is also known as non-streaming and online is known as streaming. You cannot mix with them.

Hi thanks for the reply. I seeee... So does it mean for these three mode, I need to start three different services? I just tried ./sherpa/bin/pruned_transducer_statelessX/offline_server.py to start another service on 6008 with the same streaming model.

./sherpa/bin/pruned_transducer_statelessX/offline_server.py --endpoint.rule3.min-utterance-length 100.0 --port 6008 --nn-model-filename ../icefall/egs/my_models/librispeech/pruned_transducer_stateless4/exp/cpu_jit.pt --bpe-model-filename ../icefall/egs/yikan_models/librispeech/data/lang_bpe_500/bpe.model --certificate ./sherpa/bin/web/cert.pem

And now from my 6007 client end, I can connect to 6006 to do streaming recording, and connect to 6008 to upload wavfile for decoding. And for the offline-recording, can I use any script in sherpa to implement this service with that streaming model too? Or they are just not compatible?

treya-lin commented 1 year ago

By the way, the latest master can make https and ws share the same port.

Please use the following option

https://github.com/k2-fsa/sherpa/blob/5b083605a895bf289b49a92b388304b2e86523e3/sherpa/bin/streaming_pruned_transducer_statelessX/streaming_server.py#L201-L209

Oh so that's why we need this argument here. I see. Thanks!

And another question is that the model I exported in icefall is the cpu_jit.pt, does it mean it decodes with cpu? But seems these two scripts still loaded the model to GPU? image

Is there a problem here? Or should I explicitly disable the use of cuda?

csukuangfj commented 1 year ago

And another question is that the model I exported in icefall is the cpu_jit.pt, does it mean it decodes with CPU?

Please have a look at our doc at https://k2-fsa.github.io/icefall/model-export/export-with-torch-jit-script.html

Screenshot 2023-03-23 at 15 41 38


But seems these two scripts still loaded the model to GPU?

That is due to https://github.com/k2-fsa/sherpa/blob/5b083605a895bf289b49a92b388304b2e86523e3/sherpa/bin/streaming_pruned_transducer_statelessX/streaming_server.py#L297-L300

To use CPU, please set the following environment variable before starting the server:

export CUDA_VISBLE_DEVICES=""
treya-lin commented 1 year ago

And another question is that the model I exported in icefall is the cpu_jit.pt, does it mean it decodes with CPU?

Please have a look at our doc at https://k2-fsa.github.io/icefall/model-export/export-with-torch-jit-script.html

Screenshot 2023-03-23 at 15 41 38

But seems these two scripts still loaded the model to GPU?

That is due to

https://github.com/k2-fsa/sherpa/blob/5b083605a895bf289b49a92b388304b2e86523e3/sherpa/bin/streaming_pruned_transducer_statelessX/streaming_server.py#L297-L300

To use CPU, please set the following environment variable before starting the server:

export CUDA_VISBLE_DEVICES=""

I see. Thanks. It's working beautifully now.

And for these three mode, I need to start three different services right? I just tried ./sherpa/bin/pruned_transducer_statelessX/offline_server.py to start another service on 6008 with the same streaming model.

So is it that with a streaming model we can only turn on the upload mode and streaming recodring mode, and with a non-streaming model, we can only turn on the upload mode and offline recording mode. Is that the case?

csukuangfj commented 1 year ago

So is it that with a streaming model we can only turn on the upload mode and streaming recodring mode, and with a non-streaming model, we can only turn on the upload mode and offline recording mode. Is that the case?

@treya-lin Sorry for the late reply.

yes, you're right.

treya-lin commented 1 year ago

So is it that with a streaming model we can only turn on the upload mode and streaming recodring mode, and with a non-streaming model, we can only turn on the upload mode and offline recording mode. Is that the case?

@treya-lin Sorry for the late reply.

yes, you're right.

No worries! Thanks, got it! :D