k2-fsa / sherpa

Speech-to-text server framework with next-gen Kaldi
https://k2-fsa.github.io/sherpa
Apache License 2.0
523 stars 105 forks source link

Has offline zipformer TensorRT been supported? #637

Open Vergissmeinicht opened 3 weeks ago

Vergissmeinicht commented 3 weeks ago

https://github.com/k2-fsa/sherpa/tree/master/triton/scripts Have checked the scripts here but only conformer trt script (triton/scripts/build_librispeech_pruned_transducer_stateless3_offline_trt.sh) released. Is it ok for zipformer to do export-onnx -> trtexec to get tensorrt engine too?

csukuangfj commented 3 weeks ago

@yuekaizhang Could you have a look?

yuekaizhang commented 3 weeks ago

https://github.com/k2-fsa/sherpa/tree/master/triton/scripts Have checked the scripts here but only conformer trt script (triton/scripts/build_librispeech_pruned_transducer_stateless3_offline_trt.sh) released. Is it ok for zipformer to do export-onnx -> trtexec to get tensorrt engine too?

@Vergissmeinicht Not yet, let me do it and I will give update here.

Vergissmeinicht commented 3 weeks ago

https://github.com/k2-fsa/sherpa/tree/master/triton/scripts Have checked the scripts here but only conformer trt script (triton/scripts/build_librispeech_pruned_transducer_stateless3_offline_trt.sh) released. Is it ok for zipformer to do export-onnx -> trtexec to get tensorrt engine too?

@Vergissmeinicht Not yet, let me do it and I will give update here.

Thanks! FYI, I've tried the onnx model from (https://k2-fsa.github.io/sherpa/onnx/pretrained_models/offline-transducer/zipformer-transducer-models.html#sherpa-onnx-zipformer-gigaspeech-2023-12-12-english) to do onnx export and trtexec, but trtexec fails while parsing softmax op with 1-d input. Then I try onnx-graphsurgeon to fix this 1-d input problem, but trtexec still fails with if-conditional outputs which comes from CompactRelPositionalEncoding.

yuekaizhang commented 3 weeks ago

@Vergissmeinicht Just comment the lines should be okay https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/zipformer/zipformer.py#L1422-L1427.

Vergissmeinicht commented 3 weeks ago

@Vergissmeinicht Just comment the lines should be okay https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/zipformer/zipformer.py#L1422-L1427.

It works for me. But when I try using trtexec to convert the zipformer onnx model from my teammate, it fails while parsing Slice node, saying that "This version of TensorRT does not supoort dynamic axes". Maybe my icefall version does not match his. Any solution to parse this Slice op?

yuekaizhang commented 3 weeks ago

@Vergissmeinicht Just comment the lines should be okay https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/zipformer/zipformer.py#L1422-L1427.

It works for me. But when I try using trtexec to convert the zipformer onnx model from my teammate, it fails while parsing Slice node, saying that "This version of TensorRT does not supoort dynamic axes". Maybe my icefall version does not match his. Any solution to parse this Slice op?

@Vergissmeinicht Pleasae use the latest tensorrt e.g. trt 10.2 in tritonserver:24.07-py3.

Vergissmeinicht commented 2 weeks ago

@Vergissmeinicht Just comment the lines should be okay https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/zipformer/zipformer.py#L1422-L1427.

It works for me. But when I try using trtexec to convert the zipformer onnx model from my teammate, it fails while parsing Slice node, saying that "This version of TensorRT does not supoort dynamic axes". Maybe my icefall version does not match his. Any solution to parse this Slice op?

@Vergissmeinicht Pleasae use the latest tensorrt e.g. trt 10.2 in tritonserver:24.07-py3.

I follow the latest turtorial to run build_wenetspeech_zipformer_offline_trt.sh. It fails due to oom where tactic device request 34024MB (my 4090ti has 24217MB available). Do you use other gpu with larger memory?

yuekaizhang commented 2 weeks ago

I follow the latest turtorial to run build_wenetspeech_zipformer_offline_trt.sh. It fails due to oom where tactic device request 34024MB (my 4090ti has 24217MB available). Do you use other gpu with larger memory?

Are you using the larger model comparing with the model in build_wenetspeech_zipformer_offline_trt.sh?

Would you mind changing the option? https://github.com/NVIDIA/trt-samples-for-hackathon-cn/blob/master/cookbook/07-Tool/trtexec/Help.txt#L37

Vergissmeinicht commented 2 weeks ago

build_wenetspeech_zipformer_offline_trt.sh

I use the model downloaded from https://github.com/k2-fsa/sherpa/blob/master/triton/scripts/build_wenetspeech_zipformer_offline_trt.sh#L47C5-L47C110. The docker I use is soar97/triton-k2:24.07.

Vergissmeinicht commented 2 weeks ago

I follow the latest turtorial to run build_wenetspeech_zipformer_offline_trt.sh. It fails due to oom where tactic device request 34024MB (my 4090ti has 24217MB available). Do you use other gpu with larger memory?

Are you using the larger model comparing with the model in build_wenetspeech_zipformer_offline_trt.sh?

Would you mind changing the option? https://github.com/NVIDIA/trt-samples-for-hackathon-cn/blob/master/cookbook/07-Tool/trtexec/Help.txt#L37

Here's the building log. Maybe there's something different. log.txt

Vergissmeinicht commented 5 days ago

I follow the latest turtorial to run build_wenetspeech_zipformer_offline_trt.sh. It fails due to oom where tactic device request 34024MB (my 4090ti has 24217MB available). Do you use other gpu with larger memory?

Are you using the larger model comparing with the model in build_wenetspeech_zipformer_offline_trt.sh? Would you mind changing the option? https://github.com/NVIDIA/trt-samples-for-hackathon-cn/blob/master/cookbook/07-Tool/trtexec/Help.txt#L37

Here's the building log. Maybe there's something different. log.txt

@yuekaizhang Hi, is there any progress on this problem? Appreciate your reply.