k2-fsa / sherpa

Speech-to-text server framework with next-gen Kaldi
https://k2-fsa.github.io/sherpa
Apache License 2.0
521 stars 105 forks source link

support zipformer #350

Closed alidabaghi123 closed 1 year ago

alidabaghi123 commented 1 year ago

hello. Thanks for your efforts. do you support zipformer in sherpa-framework? i can export zipformer in sherpa-onnx but cannot export to sherpa-framework

csukuangfj commented 1 year ago

Yes. Of course.

Non-streaming zipformer models

Streaming zipformer models

Please refer to https://k2-fsa.github.io/icefall/model-export/index.html for how to export models from icefall.

Specifically, for streaming zipformer, please see

For non-streaming zipformer, please see


Also, please see examples at

alidabaghi123 commented 1 year ago

thank you very much.

li563042811 commented 1 year ago

May I ask when the online websocket server can support zipformer? It seems that it does not support it now. @csukuangfj

csukuangfj commented 1 year ago

May I ask when the online websocket server can support zipformer? It seems that it does not support it now. @csukuangfj

It is supported in C++ websocket server.

Please have a look at our documentation https://k2-fsa.github.io/sherpa/cpp/pretrained_models/online_transducer.html

Screenshot 2023-04-04 at 22 51 20

uni-sagar-raikar commented 1 year ago

Do we have pythonic server setup for streaming zipformer models? streaming_server.py doesnt seem to work actually with zipformer streaming models.

Thanks in advance Sagar

csukuangfj commented 1 year ago

Do we have pythonic server setup for streaming zipformer models? streaming_server.py doesnt seem to work actually with zipformer streaming models.

Thanks in advance Sagar

Currently, no, We have C++ websocket server that performs better than its Python counterpart.

alidabaghi123 commented 1 year ago

yes. i cant run for zipformer

uni-sagar-raikar commented 1 year ago

May I ask when the online websocket server can support zipformer? It seems that it does not support it now. @csukuangfj

It is supported in C++ websocket server.

Please have a look at our documentation https://k2-fsa.github.io/sherpa/cpp/pretrained_models/online_transducer.html

Screenshot 2023-04-04 at 22 51 20

@csukuangfj In sherpa-online-websocket-server there are no args supported to load jit models. Am I missing something here?

csukuangfj commented 1 year ago

May I ask when the online websocket server can support zipformer? It seems that it does not support it now. @csukuangfj

It is supported in C++ websocket server. Please have a look at our documentation https://k2-fsa.github.io/sherpa/cpp/pretrained_models/online_transducer.html Screenshot 2023-04-04 at 22 51 20

@csukuangfj In sherpa-online-websocket-server there are no args supported to load jit models. Am I missing something here?

Please run

sherpa-online-websocket-server --help

to view the help messages.


Screenshot 2023-04-12 at 11 06 14

uni-sagar-raikar commented 1 year ago

Thanks for the help, I missed the obvious part.

Also, now I am facing issue on running the online websocket server with GPU. Getting the following error:

**_terminate called after throwing an instance of 'std::runtime_error'
what(): The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/torch/zipformer.py", line 2809, in forward
_85 = annotate(number, torch.add(_84, CONSTANTS.c1))
cum_mask = torch.arange(1, _85, dtype=None, layout=None, device=torch.device("cpu"), pin_memory=False)
_86 = torch.add(torch.unsqueeze(cum_mask, 1), torch.unsqueeze(cached_len4, 0))


    _87 = torch.mul(torch.reciprocal(_86), CONSTANTS.c2)                                                                      
    pooling_mask = torch.unsqueeze(_87, 2)_**

Is this because of some jit tracing mistake we have done? We are using [jit_trace_export.py](https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless7_streaming/jit_trace_export.py)
csukuangfj commented 1 year ago

Which version of PyTorch are you using? Also, are you using our provided pre-trained streaming zipformer or are you exporting the model by yourself?

uni-sagar-raikar commented 1 year ago

torch version is 1.13.1, I am using the standard sherpa docker itself.

On zipformer model, we have exported model ourselves after training it on icefall.

csukuangfj commented 1 year ago

We have never seen this error before. Does it work with our pre-trained model listed in the doc?

uni-sagar-raikar commented 1 year ago

Nope, it does not work with pretrained model as well. Please note that, i am enabling --use-gpu=true and this is causing the issue.

csukuangfj commented 1 year ago

Nope, it does not work with pretrained model as well. Please note that, i am enabling --use-gpu=true and this is causing the issue.

I see. So it works with --use-gpu=false?


Please export a CUDA version of traced model if you want to use --use-gpu=true.

uni-sagar-raikar commented 1 year ago

Whats the CUDA version for pretrained librispeech model? Also, how would CUDA export making a difference here? I checked torch and cuda devices are accessible actually.

csukuangfj commented 1 year ago

Please remove https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless7_streaming/jit_trace_export.py#L290 when exporting the model to CUDA using jit trace.

uni-sagar-raikar commented 1 year ago

Tried this, no success. Do I need to explicitly move some model onto GPU while tracing? There are multiple instances where model is being moved to device.

csukuangfj commented 1 year ago

Tried this, no success.

Did you fail to export the model with CUDA or did you fail to run the cuda-exported model with sherpa?

Could you please post the error logs?

uni-sagar-raikar commented 1 year ago

Failed to export the model with CUDA at the first place.

`[I] /workspace/sherpa/sherpa/cpp_api/online-recognizer.cc:403:void sherpa::OnlineRecognizer::OnlineRecognizerImp l::WarmUp() 2023-04-12 18:13:21.070 WarmUp begins
terminate called after throwing an instance of 'std::runtime_error'

what(): The following operation failed in the TorchScript interpreter. Traceback of TorchScript, serialized code (most recent call last):
File "code/torch/zipformer.py", line 2809, in forward
_85 = annotate(number, torch.add(_84, CONSTANTS.c1))
cum_mask = torch.arange(1, _85, dtype=None, layout=None, device=torch.device("cpu"), pin_memory=False)
_86 = torch.add(torch.unsqueeze(cum_mask, 1), torch.unsqueeze(cached_len4, 0))


    _87 = torch.mul(torch.reciprocal(_86), CONSTANTS.c2)                  
    pooling_mask = torch.unsqueeze(_87, 2)`

 logs:
`/mnt/efs/dspavankumar/tools/icefall/egs/en-us/pruned_transducer_stateless7_streaming/jit_trace_export_gpu.py(104
): <module>
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!`
csukuangfj commented 1 year ago

please move https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless7_streaming/jit_trace_export.py#L130


    x = torch.zeros(1, T, 80, dtype=torch.float32)
    x_lens = torch.full((1,), T, dtype=torch.int32)
    states = encoder_model.get_init_state(device=x.device)

to CUDA.

(I thought you were able to fix them by yourself).

csukuangfj commented 1 year ago

Failed to export the model with CUDA at the first place.

`[I] /workspace/sherpa/sherpa/cpp_api/online-recognizer.cc:403:void sherpa::OnlineRecognizer::OnlineRecognizerImp l::WarmUp() 2023-04-12 18:13:21.070 WarmUp begins terminate called after throwing an instance of 'std::runtime_error'

what(): The following operation failed in the TorchScript interpreter. Traceback of TorchScript, serialized code (most recent call last): File "code/torch/zipformer.py", line 2809, in forward _85 = annotate(number, torch.add(_84, CONSTANTS.c1)) cum_mask = torch.arange(1, _85, dtype=None, layout=None, device=torch.device("cpu"), pin_memory=False) _86 = torch.add(torch.unsqueeze(cum_mask, 1), torch.unsqueeze(cached_len4, 0)) ~~~~~ <---HERE _87 = torch.mul(torch.reciprocal(_86), CONSTANTS.c2) pooling_mask = torch.unsqueeze(_87, 2)`

logs: /mnt/efs/dspavankumar/tools/icefall/egs/en-us/pruned_transducer_stateless7_streaming/jit_trace_export_gpu.py(104 ): <module> RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Please use latest export.py from icefall, i.e., changes from https://github.com/k2-fsa/icefall/pull/1005

We now support passing cpu_jit.pt. Please see also https://github.com/k2-fsa/sherpa/pull/365 and our updated doc https://k2-fsa.github.io/sherpa/cpp/pretrained_models/online_transducer.html#icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29

Screenshot 2023-04-17 at 21 41 44

csukuangfj commented 1 year ago

Closing via #365