added triton config for streaming zipformer

uni-saurabh-vyas commented 1 year ago

Based on https://github.com/k2-fsa/sherpa/issues/412

yuekaizhang commented 1 year ago

Many thanks. Would you mind adding a script such that we could lauch the streaming service with one button? e.g. build_librispeech_pruned_transducer_stateless7_streaming.sh

uni-saurabh-vyas commented 1 year ago

Yes give me one day, will push everything and add a README.md also

uni-saurabh-vyas commented 1 year ago

@yuekaizhang I have done requested changes, and tested with bash script, triton server is up and running with all modules without any issues, also tested via decode_manifest client script I get ASR outputs.

Also, I feel readme file is not required, as build script already lists all the steps to start the server, users should be able to follow easily. Let me know if anything else required.

yuekaizhang commented 1 year ago

All config.pbtxt, config.pbtxt.template and model.py files under model_repo_streaming_zipformer/ might be same with files in model_repo_streaming except for encoder/config.pbtxt.template. Would you mind checking them? (diff file1 file2 first, then decide if we need a soft link.)

The bash scripts look nice. Thanks again.

uni-saurabh-vyas commented 1 year ago

I already checked one by one, they are not same, I made some changes(for example changed type to FP32, to make it compatible with pretrained model), there was tensor name change for encoder outputs(x and x_lens from speech and speech_lengths) in python file for feature_extractor.py

The only same file is scorer/1/model.py

yuekaizhang commented 1 year ago

I already checked one by one, they are not same, I made some changes(for example changed type to FP32, to make it compatible with pretrained model), there was tensor name change for encoder outputs(x and x_lens from speech and speech_lengths) in python file for feature_extractor.py

The only same file is scorer/1/model.py

I see. https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless7_streaming/export.py#L811-L820 Actually, you could set --fp16 to export fp16.onnx. Then the whole model_repo could infer with fp16, which could increase throughput a lot. I was wondering if you would like try with fp16.

uni-saurabh-vyas commented 1 year ago

Yes sure, I will try with that, didnt use that option before, will update the script(s) accordingly to make it as generic as possible

yuekaizhang commented 1 year ago

Also, it would be better if we make a PR https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless7_streaming/export.py#L377-L394 to change the str "x" into "speech" "x_length" into "speech_lenghts" such that we could reuse the feature_extractor part codes.

yuekaizhang commented 1 year ago

Also, it would be better if we make a PR https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless7_streaming/export.py#L377-L394 to change the str "x" into "speech" "x_length" into "speech_lenghts" such that we could reuse the feature_extractor part codes.

Or https://github.com/k2-fsa/sherpa/tree/master/triton/model_repo_streaming/feature_extractor change files here and scripts/export_onnx.py, from "speech" to "x", if this way is more convenient for you.

uni-saurabh-vyas commented 1 year ago

I am still a little bit confused as to why we have so many different export onnx scripts ?

yuekaizhang commented 1 year ago

I am still a little bit confused as to why we have so many different export onnx scripts ?

Ideally, we would like reuse the scripts for sherpa-onnx project. However, several minor differences block it e.g. if processed_len is regarded as a state tensor for streaming ASR.

The WAR for now is to use scripts/build_X bash scripts to avoid confusion. Also, the difference between conformer and zipformer let us have to keep separte model_repo files and export scripts.

uni-saurabh-vyas commented 1 year ago

Added the required changes.

Also tested out with fp16, the script works, however Triton gives a warning when loading.

I0708 04:34:38.286388 11446 model_lifecycle.cc:694] successfully loaded 'feature_extractor' version 1
2023-07-08 04:34:38.472352679 [W:onnxruntime:, constant_folding.cc:179 ApplyImpl] Could not find a CPU kernel and hence
 can't constant fold Pow node '/Pow_1'
2023-07-08 04:34:38.472412110 [W:onnxruntime:, constant_folding.cc:179 ApplyImpl] Could not find a CPU kernel and hence
 can't constant fold Pow node '/Pow'
2023-07-08 04:34:38.598780814 [W:onnxruntime:, constant_folding.cc:179 ApplyImpl] Could not find a CPU kernel and hence
 can't constant fold Pow node '/Pow_1'
2023-07-08 04:34:38.598831005 [W:onnxruntime:, constant_folding.cc:179 ApplyImpl] Could not find a CPU kernel and hence
 can't constant fold Pow node '/Pow'
2023-07-08 04:34:38.612756773 [W:onnxruntime:, constant_folding.cc:179 ApplyImpl] Could not find a CPU kernel and hence
 can't constant fold Pow node '/Pow_1'
2023-07-08 04:34:38.612800814 [W:onnxruntime:, constant_folding.cc:179 ApplyImpl] Could not find a CPU kernel and hence
 can't constant fold Pow node '/Pow'
2023-07-08 04:34:39.567382515 [W:onnxruntime:, session_state.cc:1030 VerifyEachNodeIsAssignedToAnEp] Some nodes were no
t assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT e
xplicitly assigns shape related ops to CPU to improve perf.

yuekaizhang commented 1 year ago

Added the required changes.

Also tested out with fp16, the script works, however Triton gives a warning when loading.

I0708 04:34:38.286388 11446 model_lifecycle.cc:694] successfully loaded 'feature_extractor' version 1
2023-07-08 04:34:38.472352679 [W:onnxruntime:, constant_folding.cc:179 ApplyImpl] Could not find a CPU kernel and hence
 can't constant fold Pow node '/Pow_1'
2023-07-08 04:34:38.472412110 [W:onnxruntime:, constant_folding.cc:179 ApplyImpl] Could not find a CPU kernel and hence
 can't constant fold Pow node '/Pow'
2023-07-08 04:34:38.598780814 [W:onnxruntime:, constant_folding.cc:179 ApplyImpl] Could not find a CPU kernel and hence
 can't constant fold Pow node '/Pow_1'
2023-07-08 04:34:38.598831005 [W:onnxruntime:, constant_folding.cc:179 ApplyImpl] Could not find a CPU kernel and hence
 can't constant fold Pow node '/Pow'
2023-07-08 04:34:38.612756773 [W:onnxruntime:, constant_folding.cc:179 ApplyImpl] Could not find a CPU kernel and hence
 can't constant fold Pow node '/Pow_1'
2023-07-08 04:34:38.612800814 [W:onnxruntime:, constant_folding.cc:179 ApplyImpl] Could not find a CPU kernel and hence
 can't constant fold Pow node '/Pow'
2023-07-08 04:34:39.567382515 [W:onnxruntime:, session_state.cc:1030 VerifyEachNodeIsAssignedToAnEp] Some nodes were no
t assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT e
xplicitly assigns shape related ops to CPU to improve perf.

This warning is fine.

yuekaizhang commented 1 year ago

Many thanks @uni-saurabh-vyas

k2-fsa / sherpa

added triton config for streaming zipformer #430