In sherpa/triton, joiner. onnx reasoning is very slow

k2-fsa / sherpa

Speech-to-text server framework with next-gen Kaldi

https://k2-fsa.github.io/sherpa

Apache License 2.0

483 stars 97 forks source link

Open arbs-gpu opened 11 months ago

arbs-gpu commented 11 months ago

I use this codehttps://github.com/k2-fsa/sherpa/tree/master/triton/zipformer/model_repo_offline to start the Triton service and send requests to count the time spent on the encoder, decoder, and joiner modules. I found that the joiner module accounts for 95% of the time, while the encoder and decoder add up to less than 5%. Therefore, I think this is very abnormal. Is there an error when the model exports the onnx format, or is there an error in the codehttps://github.com/k2-fsa/sherpa/blob/master/triton/zipformer/model_repo_offline/scorer/1/model.py

ziyu123 commented 11 months ago

I have the same problem, maybe your test audio is too long.

csukuangfj commented 11 months ago

@yuekaizhang Could you have a look at this issue?

danpovey commented 11 months ago

Are there decoder settings that would affect this? (I assume it depends on the search method, beams, etc)

yuekaizhang commented 11 months ago

riton service and send requests to count the time spent on the encoder, decoder, and joiner modules. I found

How do you count the time for triton modules? The normal distribution would like this https://github.com/yuekaizhang/Triton-ASR-Client/blob/main/log/stats_summary.txt.

Also, what's your test audio length? (It should be okay if it is shorter than 30 seconds.)