k2-fsa / sherpa

Speech-to-text server framework with next-gen Kaldi
https://k2-fsa.github.io/sherpa
Apache License 2.0
534 stars 107 forks source link

RunTime error when running online zipformer #490

Closed ezerhouni closed 11 months ago

ezerhouni commented 11 months ago

Hello !

We are trying to run a zipformer2 model trained with our data on icefall. However we are getting the following error:

e-websocket-server-impl.cc:328:void sherpa::OnlineWebsocketServer::OnOpen(connection_hdl)"}
terminate called after throwing an instance of 'std::runtime_error'
  what():  The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/encoder_jit.py", line 41, in forward
    encoder_states = torch.slice(states, None, -2)
    encoder = self.encoder
    _8 = (encoder).streaming_forward(x0, x_lens, encoder_states, src_key_padding_mask0, )
          ~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    encoder_out, encoder_out_lens, new_encoder_states, = _8
    encoder_out0 = torch.permute(encoder_out, [1, 0, 2])
  File "code/__torch__/zipformer.py", line 436, in streaming_forward
    _113 = torch.floordiv((left_context_frames)[0], ds)
    _114 = torch.slice(src_key_padding_mask, -1, None, None, ds)
    _115 = (_0).streaming_forward(x14, _112, _113, _114, )
            ~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    x15, new_layer_states, = _115
    layer_offset = torch.add(0, num_layers)
  File "code/__torch__/zipformer.py", line 565, in streaming_forward
    _1 = getattr(layers, "1")
    cached_key, cached_nonlin_attn, cached_val1, cached_val2, cached_conv1, cached_conv2, = torch.slice(states, 0, 6)
    _154 = (_0).streaming_forward(src, pos_emb, cached_key, cached_nonlin_attn, cached_val1, cached_val2, cached_conv1, cached_conv2, left_context_len, src_key_padding_mask, )
            ~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    output, new_cached_key, new_cached_nonlin_attn, new_cached_val1, new_cached_val2, new_cached_conv1, new_cached_conv2, = _154
    _155 = [new_cached_key, new_cached_nonlin_attn, new_cached_val1, new_cached_val2, new_cached_conv1, new_cached_conv2]
  File "code/__torch__/zipformer.py", line 796, in streaming_forward
    conv_module1 = self.conv_module1
    _205 = torch.slice(torch.slice(src_key_padding_mask), 1, left_context_len)
    _206 = (conv_module1).streaming_forward(src15, cached_conv1, _205, )
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    src_conv, cached_conv16, = _206
    src16 = torch.add(src15, src_conv)

Traceback of TorchScript, original code (most recent call last):
  File "code/encoder_jit.py", line 98, in forward
            encoder_out_lens,
            new_encoder_states,
        ) = self.encoder.streaming_forward(
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
            x=x,
            x_lens=x_lens,
  File "code/zipformer.py", line 451, in streaming_forward
            x = convert_num_channels(x, self.encoder_dim[i])

            x, new_layer_states = module.streaming_forward(
                                  ~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
                x,
                states=states[layer_offset * 6 : (layer_offset + num_layers) * 6],
  File "code/zipformer.py", line 1123, in streaming_forward
                new_cached_conv1,
                new_cached_conv2,
            ) = mod.streaming_forward(
                ~~~~~~~~~~~~~~~~~~~~~ <--- HERE
                output,
                pos_emb,
  File "code/zipformer.py", line 944, in streaming_forward
        src = src + self_attn

        src_conv, cached_conv1 = self.conv_module1.streaming_forward(
                                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
            src,
            cache=cached_conv1,
RuntimeError: vector::_M_range_check: __n (which is 18446744073709551615) >= this->size() (which is 3)

Any idea where it might comes from ? We are using pytorch 2.0.0 (planning to upgrade to 2.0.1 or 2.1 to see if the bug is still there)

Thank you very much !

ezerhouni commented 11 months ago

FYI, it seems that the pre-trained model is having the same issue :

terminate called after throwing an instance of 'std::runtime_error'
  what():  The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__.py", line 41, in forward
    encoder_states = torch.slice(states, None, -2)
    encoder = self.encoder
    _8 = (encoder).streaming_forward(x0, x_lens, encoder_states, src_key_padding_mask0, )
          ~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    encoder_out, encoder_out_lens, new_encoder_states, = _8
    encoder_out0 = torch.permute(encoder_out, [1, 0, 2])
  File "code/__torch__/zipformer.py", line 434, in streaming_forward
    _108 = torch.floordiv((left_context_frames)[0], ds)
    _109 = torch.slice(src_key_padding_mask, -1, None, None, ds)
    _110 = (_0).streaming_forward(x14, _107, _108, _109, )
            ~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    x15, new_layer_states, = _110
    layer_offset = torch.add(0, num_layers)
  File "code/__torch__/zipformer.py", line 563, in streaming_forward
    _1 = getattr(layers, "1")
    cached_key, cached_nonlin_attn, cached_val1, cached_val2, cached_conv1, cached_conv2, = torch.slice(states, 0, 6)
    _148 = (_0).streaming_forward(src, pos_emb, cached_key, cached_nonlin_attn, cached_val1, cached_val2, cached_conv1, cached_conv2, left_context_len, src_key_padding_mask, )
            ~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    output, new_cached_key, new_cached_nonlin_attn, new_cached_val1, new_cached_val2, new_cached_conv1, new_cached_conv2, = _148
    _149 = [new_cached_key, new_cached_nonlin_attn, new_cached_val1, new_cached_val2, new_cached_conv1, new_cached_conv2]
  File "code/__torch__/zipformer.py", line 794, in streaming_forward
    conv_module1 = self.conv_module1
    _198 = torch.slice(torch.slice(src_key_padding_mask), 1, left_context_len)
    _199 = (conv_module1).streaming_forward(src15, cached_conv1, _198, )
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    src_conv, cached_conv16, = _199
    src16 = torch.add(src15, src_conv)

Traceback of TorchScript, original code (most recent call last):
  File "./zipformer/export.py", line 289, in forward
            encoder_out_lens,
            new_encoder_states,
        ) = self.encoder.streaming_forward(
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
            x=x,
            x_lens=x_lens,
  File "/ceph-zw/workspace/zipformer/icefall_zipformer/egs/librispeech/ASR/zipformer/zipformer.py", line 441, in streaming_forward
            x = convert_num_channels(x, self.encoder_dim[i])

            x, new_layer_states = module.streaming_forward(
                                  ~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
                x,
                states=states[layer_offset * 6 : (layer_offset + num_layers) * 6],
  File "/ceph-zw/workspace/zipformer/icefall_zipformer/egs/librispeech/ASR/zipformer/zipformer.py", line 1026, in streaming_forward
                new_cached_conv1,
                new_cached_conv2
            ) = mod.streaming_forward(
                ~~~~~~~~~~~~~~~~~~~~~ <--- HERE
                output,
                pos_emb,
  File "/ceph-zw/workspace/zipformer/icefall_zipformer/egs/librispeech/ASR/zipformer/zipformer.py", line 856, in streaming_forward
        src = src + self_attn

        src_conv, cached_conv1 = self.conv_module1.streaming_forward(
                                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
            src,
            cache=cached_conv1,
RuntimeError: vector::_M_range_check: __n (which is 18446744073709551615) >= this->size() (which is 3)
csukuangfj commented 11 months ago

Are you using the latest code?

I think it has been fixed in https://github.com/k2-fsa/icefall/pull/1131

ezerhouni commented 11 months ago

Thank you ! I will try it out and let you know. Also, I am seeing other place with chunk(.., dim=-1) should we take care of it ? If so I can create a PR

csukuangfj commented 11 months ago

Thank you ! I will try it out and let you know. Also, I am seeing other place with chunk(.., dim=-1) should we take care of it ? If so I can create a PR

Yes, please do it.