k2-fsa / sherpa

Speech-to-text server framework with next-gen Kaldi
https://k2-fsa.github.io/sherpa
Apache License 2.0
521 stars 105 forks source link

How to decode list of files with online server client? #359

Closed pehonnet closed 1 year ago

pehonnet commented 1 year ago

Hello, thanks for this great tool!

I have a question regarding sherpa-online-websocket-server and sherpa-online-websocket-client. I am able to start the server and also to send audios with the client as expected according to the documentation. Now I would like to send many audio files one after the other with the client, but it seems to fail. I have not looked into the code but I guess there may be a limit to the number of files which can be passed when using a command like the following:

sherpa-online-websocket-client \
  --server-ip=127.0.0.1 \
  --server-port=6006 \
  /path/to/foo.wav \
  /path/to/foo1.wav \
  /path/to/foo2.wav

I could also just send the files one by one with a script, but this seems to be very slow (even with multiple channels in parallel) compared to the decoding in icefall. I imagine some time is lost every time the client is started so that's why I would like to know how to send many files in one command?

EDIT: now I see it actually always works only with 1 file for this version of the client, unlike some other python client. Is there a way to send more than one file when using the online websocket server?

Thanks!

csukuangfj commented 1 year ago

Are you constrained to use server/client based decoding? It is possible to decode files in batches with Python in sherpa.

pehonnet commented 1 year ago

I wanted to use the server/client framework to have something closer to my use case. But when re-running I found it was actually not that slow, so I think it is not a limitation. By the way, assuming I want to use for example 32 clients in parallel, am I right that setting the option --num-work-threads=32 is enough? For some reason when I observe the usage of resources, the CPU used for the server never reaches 3200%, it is always much lower. On the other hand, some of the clients seem to use >100% CPU (>160%). It could also be the nature of the data I am sending (mostly short utterances) which makes it not needed to use the full capacity of the server at any point...

csukuangfj commented 1 year ago

By the way, assuming I want to use for example 32 clients in parallel, am I right that setting the option --num-work-threads=32 is enough?

No, you don't need to use that many threads.

The server supports an argument --max-batch-size, which means in can process multiple clients in a single thread in batches.


Please use more clients, e.g., tens or even hundreds, to give the server more work to do so that you can see a very high usage of CPU.

pehonnet commented 1 year ago

Sorry I have not had time to test it further, but I think there is no real issue, so closing. Thanks again!