k2-fsa / sherpa

Speech-to-text server framework with next-gen Kaldi
https://k2-fsa.github.io/sherpa
Apache License 2.0
534 stars 107 forks source link

Websocket connection terminates under large load - streaming_server.py - streaming zipformer model #370

Open uni-sagar-raikar opened 1 year ago

uni-sagar-raikar commented 1 year ago

Hi @csukuangfj ,

The latest commit for streaming zipformer model support in streaming server is working fine. But the websocket connection breaks when number of workers are increased in benchmarking using decode_manifest.py.

The error is as follows: File "/usr/local/lib/python3.8/dist-packages/websockets/legacy/client.py", line 655, in __await_impl_timeout__ return await self.__await_impl__() File "/usr/local/lib/python3.8/dist-packages/websockets/legacy/async_timeout.py", line 169, in __aexit__ self._do_exit(exc_type) File "/usr/local/lib/python3.8/dist-packages/websockets/legacy/async_timeout.py", line 252, in _do_exit raise asyncio.TimeoutError asyncio.exceptions.TimeoutError

Is this due to low timeout during opening websocket connection setup?

Thanks in advance

csukuangfj commented 1 year ago

when number of workers are increased

How many number of clients are you using?

uni-sagar-raikar commented 1 year ago

Works fine for 100 as per my observation. Beyond that there is issue. Specifically I am trying 200 and 400

csukuangfj commented 1 year ago

Specifically I am trying 200 and 400

In that case, how many threads on the server side are you using?

I suggest that you switch to our C++ websocket server if you have many clients.

uni-sagar-raikar commented 1 year ago

On server side, 500 threads. Sure, I am about to try the C++ websocket server, will update by tomorrow.

csukuangfj commented 1 year ago

On server side, 500 threads

That are too many threads on the server side. I think you don't need that many threads.

What arguments are you using to start the python server?

uni-sagar-raikar commented 1 year ago

python3 streaming_server.py \ --port=6006 \ --use-gpu=true \ --num-threads=500 \ --max-batch-size=500 \ --decoding-method=greedy_search

So, whats the difference between num-threads, max-connections and max-batch-size ? and what are the recommended parameter values?

csukuangfj commented 1 year ago
--num-threads

It specifies the size of the threadpool for neural network computation.

--max-batch-size

It specifies the maximum batch size.


what are the recommended parameter values?

You have to tune them on your own case.


In your current case, 500 for threads and batch size is apparently not appropriate.

Number of threads should not exceed the number of CPUs you currently have. Also, please specify a smaller value for --max-batch-size, e.g., 10.

uni-sagar-raikar commented 1 year ago

@csukuangfj Thanks for the suggestions on parameter choices, some things are better now.

Additional query: When we are decoding audio with streaming_client.py the hypothesis is consistent but same audio being used in benchmarking with decode_manifest.py gives worser results, blank hypotheses. Any pointers on this?

uni-sagar-raikar commented 1 year ago

Any updates on this? I see that, with sherpa websocket server setup, the partials come out fine but the final hypothesis is always blank. Is this a configurational thing?

csukuangfj commented 1 year ago

Could you show the detailed commands you are using and if possible could you provide a wave whose recognition results are not consistent?