k2-fsa / sherpa

Speech-to-text server framework with next-gen Kaldi
https://k2-fsa.github.io/sherpa
Apache License 2.0
474 stars 97 forks source link

Sequence_Batching #533

Open rizwanishaq opened 5 months ago

rizwanishaq commented 5 months ago

I am checking with sequence_batching, and sequence_batching{ max_sequence_idle_microseconds: 5000000 oldest { max_candidate_sequences: 1024 max_queue_delay_microseconds: 5000 }

why we have 1024 max_candidate_sequences, if we use direct() isn't going to be much faster??

csukuangfj commented 5 months ago

Could you show the code related to sequence_batching in sherpa?

rizwanishaq commented 5 months ago

the code is the config.txt file here https://github.com/k2-fsa/sherpa/blob/master/triton/model_repo_streaming/feature_extractor/config.pbtxt.template

csukuangfj commented 5 months ago

@yuekaizhang Could you have a look at this issue?

yuekaizhang commented 5 months ago

I am checking with sequence_batching, and sequence_batching{ max_sequence_idle_microseconds: 5000000 oldest { max_candidate_sequences: 1024 max_queue_delay_microseconds: 5000 }

why we have 1024 max_candidate_sequences, if we use direct() isn't going to be much faster??

We didn't tune the max_candidate_sequences here, it's just a random choice. Could you please explain why direct() would be much faster? We didn't try direct() yet. It would be great if direct() could speed up. @rizwanishaq

rizwanishaq commented 4 months ago

@yuekaizhang I have tried both direct and with oldest, and for stream application direct is much better, as my stream app is working on each 10msec. I only have one issue, don't know how to solve that, it is that when max_sequence_idle_microseconds: 5000000 this occur for me there is no way, how to trigger this inside the model, or any other way?

yuekaizhang commented 4 months ago

@yuekaizhang I have tried both direct and with oldest, and for stream application direct is much better, as my stream app is working on each 10msec. I only have one issue, don't know how to solve that, it is that when max_sequence_idle_microseconds: 5000000 this occur for me there is no way, how to trigger this inside the model, or any other way?

@rizwanishaq Would you mind claring the questions? The max_sequence_idle_microseconds means: if "max_sequence_idle_microsseconds" is exceeded, the inference server will free the sequence slot allocated by the sequence by just discarded it.

That would be great if direct() could be better. I would appreciate it if you have some spare time to attach some perf results between direct() and oldest() similar like this https://github.com/k2-fsa/sherpa/issues/306#issuecomment-1633858997. That would be useful for us.