Open rizwanishaq opened 5 months ago
Could you show the code related to sequence_batching
in sherpa
?
the code is the config.txt file here https://github.com/k2-fsa/sherpa/blob/master/triton/model_repo_streaming/feature_extractor/config.pbtxt.template
@yuekaizhang Could you have a look at this issue?
I am checking with sequence_batching, and sequence_batching{ max_sequence_idle_microseconds: 5000000 oldest { max_candidate_sequences: 1024 max_queue_delay_microseconds: 5000 }
why we have 1024 max_candidate_sequences, if we use direct() isn't going to be much faster??
We didn't tune the max_candidate_sequences here, it's just a random choice. Could you please explain why direct() would be much faster? We didn't try direct() yet. It would be great if direct() could speed up. @rizwanishaq
@yuekaizhang I have tried both direct and with oldest, and for stream application direct is much better, as my stream app is working on each 10msec. I only have one issue, don't know how to solve that, it is that when max_sequence_idle_microseconds: 5000000 this occur for me there is no way, how to trigger this inside the model, or any other way?
@yuekaizhang I have tried both direct and with oldest, and for stream application direct is much better, as my stream app is working on each 10msec. I only have one issue, don't know how to solve that, it is that when max_sequence_idle_microseconds: 5000000 this occur for me there is no way, how to trigger this inside the model, or any other way?
@rizwanishaq Would you mind claring the questions? The max_sequence_idle_microseconds means: if "max_sequence_idle_microsseconds" is exceeded, the inference server will free the sequence slot allocated by the sequence by just discarded it.
That would be great if direct() could be better. I would appreciate it if you have some spare time to attach some perf results between direct() and oldest() similar like this https://github.com/k2-fsa/sherpa/issues/306#issuecomment-1633858997. That would be useful for us.
I am checking with sequence_batching, and sequence_batching{ max_sequence_idle_microseconds: 5000000 oldest { max_candidate_sequences: 1024 max_queue_delay_microseconds: 5000 }
why we have 1024 max_candidate_sequences, if we use direct() isn't going to be much faster??