wenet-e2e / wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit
https://wenet-e2e.github.io/wenet/
Apache License 2.0
4.13k stars 1.07k forks source link

wenetruntime with paraformer gives a TypeError #1896

Closed egorsmkv closed 10 months ago

egorsmkv commented 1 year ago

Describe the bug

I have successfully trained a Paraformer model, exported it into the JIT format and I want to use wenetruntime library to test decoding.

To Reproduce Steps to reproduce the behavior:

My code:

import sys
import torch
import wenetruntime as wenet

wav_file = '/home/yehor/ext-ml-disk/asr/test_file.wav'
decoder = wenet.Decoder(model_dir='models/paraformer_model')
ans = decoder.decode_wav(wav_file)
print(ans)

Expected behavior

A transcription.

Error

Traceback (most recent call last):
  File "recognize_wenet.py", line 8, in <module>
    ans = decoder.decode_wav(wav_file)
  File "/home/yehor/Tools/anaconda3/envs/my/lib/python3.8/site-packages/wenetruntime/decoder.py", line 114, in decode_wav
    return self.decode(wav, True)
  File "/home/yehor/Tools/anaconda3/envs/my/lib/python3.8/site-packages/wenetruntime/decoder.py", line 96, in decode
    _wenet.wenet_decode(self.d, pcm, len(pcm), finish)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__/wenet/paraformer/paraformer.py", line 131, in forward_attention_decoder
    r_hyps2 = torch.cat([_24, r_hyps1], 1)
    decoder = self.decoder
    _25 = (decoder).forward(encoder_out0, encoder_mask, hyps, hyps_lens, r_hyps2, reverse_weight, )
           ~~~~~~~~~~~~~~~~ <--- HERE
    decoder_out, r_decoder_out, _26, = _25
    decoder_out0 = _18(decoder_out, -1, 3, None, )
  File "code/__torch__/wenet/transformer/decoder.py", line 38, in forward
    _40 = getattr(decoders, "4")
    _5 = getattr(decoders, "5")
    _6 = (_00).forward(x, tgt_mask1, memory, memory_mask, None, )
          ~~~~~~~~~~~~ <--- HERE
    x0, tgt_mask2, memory0, memory_mask0, = _6
    _7 = (_10).forward(x0, tgt_mask2, memory0, memory_mask0, None, )
  File "code/__torch__/wenet/transformer/decoder_layer.py", line 25, in forward
    if normalize_before:
      norm1 = self.norm1
      tgt0 = (norm1).forward(tgt, )
              ~~~~~~~~~~~~~~ <--- HERE
    else:
      tgt0 = tgt
  File "code/__torch__/torch/nn/modules/normalization.py", line 16, in forward
    weight = self.weight
    bias = self.bias
    _1 = _0(input, [256], weight, bias, 1.0000000000000001e-05, )
         ~~ <--- HERE
    return _1
  File "code/__torch__/torch/nn/functional.py", line 86, in layer_norm
    bias: Optional[Tensor]=None,
    eps: float=1.0000000000000001e-05) -> Tensor:
  _8 = torch.layer_norm(input, normalized_shape, weight, bias, eps)
       ~~~~~~~~~~~~~~~~ <--- HERE
  return _8
def silu(input: Tensor,

Traceback of TorchScript, original code (most recent call last):
  File "/home/yehor/ext-ml-disk/github/wenet-2.2.1/wenet/transformer/asr_model.py", line 951, in forward_attention_decoder
        #   >>>         [sos, 2, eos, eos]])

        decoder_out, r_decoder_out, _ = self.decoder(
                                        ~~~~~~~~~~~~ <--- HERE
            encoder_out, encoder_mask, hyps, hyps_lens, r_hyps,
            reverse_weight)  # (num_hyps, max_hyps_len, vocab_size)
  File "/home/yehor/ext-ml-disk/github/wenet-2.2.1/wenet/transformer/decoder.py", line 136, in forward
        x, _ = self.embed(tgt)
        for layer in self.decoders:
            x, tgt_mask, memory, memory_mask = layer(x, tgt_mask, memory,
                                               ~~~~~ <--- HERE
                                                     memory_mask)
        if self.normalize_before:
  File "/home/yehor/ext-ml-disk/github/wenet-2.2.1/wenet/transformer/decoder_layer.py", line 92, in forward
        residual = tgt
        if self.normalize_before:
            tgt = self.norm1(tgt)
                  ~~~~~~~~~~ <--- HERE

        if cache is None:
  File "/home/yehor/Tools/anaconda3/envs/my/lib/python3.8/site-packages/torch/nn/modules/normalization.py", line 190, in forward
    def forward(self, input: Tensor) -> Tensor:
        return F.layer_norm(
               ~~~~~~~~~~~~ <--- HERE
            input, self.normalized_shape, self.weight, self.bias, self.eps)
  File "/home/yehor/Tools/anaconda3/envs/my/lib/python3.8/site-packages/torch/nn/functional.py", line 2515, in layer_norm
            layer_norm, (input, weight, bias), input, normalized_shape, weight=weight, bias=bias, eps=eps
        )
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
           ~~~~~~~~~~~~~~~~ <--- HERE
RuntimeError: Given normalized_shape=[256], expected input with shape [*, 256], but got input of size[10, 23]

Exception ignored in: <function Decoder.__del__ at 0x7f6aaa715c10>
Traceback (most recent call last):
  File "/home/yehor/Tools/anaconda3/envs/my/lib/python3.8/site-packages/wenetruntime/decoder.py", line 56, in __del__
TypeError: 'NoneType' object is not callable

Versions:

couldn commented 1 year ago

same question

egorsmkv commented 1 year ago

hello, any update on it?

robin1001 commented 1 year ago

We haven't tested it paraformer in runtime yet. I'm afraid we don't have bandwidth to fix the problem recently.

xingchensong commented 10 months ago

please use cli,wenetruntime is deprecated