k2-fsa / sherpa-onnx

Speech-to-text, text-to-speech, speaker recognition, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter, Object Pascal, Lazarus, Rust
https://k2-fsa.github.io/sherpa/onnx/index.html
Apache License 2.0
3.23k stars 376 forks source link

RuntimeError when running sherpa-onnx - Dimension Mismatch #235

Closed lightfate closed 1 year ago

lightfate commented 1 year ago

Hello,

I'm trying to use the sherpa-onnx Python API to transcribe audio files with the zipformer model. However, I'm encountering an error indicating a dimension mismatch between the input data and the model's expectations.

Here is the command I'm running:

python offline-decode-files.py \ --tokens=./sherpa-onnx-zipformer-en-2023-04-01/tokens.txt \ --encoder=./sherpa-onnx-zipformer-en-2023-04-01/encoder-epoch-99-avg-1.onnx \ --decoder=./sherpa-onnx-zipformer-en-2023-04-01/decoder-epoch-99-avg-1.onnx \ --joiner=./sherpa-onnx-zipformer-en-2023-04-01/joiner-epoch-99-avg-1.onnx \ ./sherpa-onnx-zipformer-en-2023-04-01/test_wavs/0.wav \ ./sherpa-onnx-zipformer-en-2023-04-01/test_wavs/1.wav \ ./sherpa-onnx-zipformer-en-2023-04-01/test_wavs/8k.wav

And here is the error message I'm getting: Started! D:\a\sherpa-onnx\sherpa-onnx\sherpa-onnx\csrc\offline-stream.cc:AcceptWaveformImpl:108 Creating a resampler: in_sample_rate: 8000 output_sample_rate: 16000 Traceback (most recent call last): File "offline-decode-files.py", line 340, in main() File "offline-decode-files.py", line 319, in main recognizer.decode_streams(streams) File "D:\004-Workspace\pycharm\sherpa\venv\lib\site-packages\sherpa_onnx\offline_recognizer.py", line 242, in decode_streams self.recognizer.decode_streams(ss) RuntimeError: Got invalid dimensions for input: x for the following indices index: 1 Got: 1764 Expected: 39 Please fix either the inputs or the model.

From the error message, it seems like the input data's dimensions don't match what the model is expecting. However, I'm not sure why this is the case, as I'm using the provided offline-decode-files.py script and the test WAV files and your models

I would greatly appreciate any insights or advice on how to resolve this issue. Thank you in advance for your help!

here is my env: python ==3.8 (venv) PS D:\004-Workspace\pycharm\sherpa> pip list Package Version


numpy 1.24.4 pip 22.3.1 sentencepiece 0.1.96 setuptools 65.5.1 sherpa-onnx 1.5.5 wheel 0.38.4

csukuangfj commented 1 year ago

Could you pass the option

--debug=true

when you invoke

python offline-decode-files.py

and show the output?

lightfate commented 1 year ago

Could you pass the option

--debug=true

when you invoke

python offline-decode-files.py

and show the output? like this: (venv) PS D:\004-Workspace\pycharm\sherpa> python offline-decode-files.py --tokens=./sherpa-onnx-zipformer-en-2023-04-01/tokens.txt --encoder=./sherpa-onnx-zipformer -en-2023-04-01/encoder-epoch-99-avg-1.onnx --decoder=./sherpa-onnx-zipformer-en-2023-04-01/decoder-epoch-99-avg-1.onnx --joiner=./sherpa-onnx-zipformer-en-2023-04-01 /joiner-epoch-99-avg-1.onnx ./sherpa-onnx-zipformer-en-2023-04-01/test_wavs/0.wav ./sherpa-onnx-zipformer-en-2023-04-01/test_wavs/1.wav ./sherpa-onnx-zipformer-en-20 23-04-01/test_wavs/8k.wav sherpa --debug=true D:\a\sherpa-onnx\sherpa-onnx\sherpa-onnx\csrc\offline-transducer-model.cc:InitEncoder:141 ---encoder--- encoder_dims=384,384,384,384,384 version=1 model_type=zipformer model_author=k2-fsa attention_dims=192,192,192,192,192 decode_chunk_len=32 num_encoder_layers=2,4,3,2,4 T=39 cnn_module_kernels=31,31,31,31,31 left_context_len=64,32,16,8,32

D:\a\sherpa-onnx\sherpa-onnx\sherpa-onnx\csrc\offline-transducer-model.cc:InitDecoder:161 ---decoder--- vocab_size=6254 context_size=2

D:\a\sherpa-onnx\sherpa-onnx\sherpa-onnx\csrc\offline-transducer-model.cc:InitJoiner:185 ---joiner--- joiner_dim=512

Started! D:\a\sherpa-onnx\sherpa-onnx\sherpa-onnx\csrc\offline-stream.cc:AcceptWaveformImpl:108 Creating a resampler: in_sample_rate: 8000 output_sample_rate: 16000

Traceback (most recent call last): File "offline-decode-files.py", line 340, in main() File "offline-decode-files.py", line 305, in main assert_file_exists(wave_filename) File "offline-decode-files.py", line 191, in assert_file_exists assert Path(filename).is_file(), ( AssertionError: sherpa does not exist! Please refer to https://k2-fsa.github.io/sherpa/onnx/pretrained_models/index.html to download it

csukuangfj commented 1 year ago

Here is the output on my side when using --debug=true:

/Users/runner/work/sherpa-onnx/sherpa-onnx/sherpa-onnx/csrc/offline-transducer-model.cc:InitEncoder:141 ---encoder---
model_author=k2-fsa
model_type=zipformer
version=1
comment=stateless7

/Users/runner/work/sherpa-onnx/sherpa-onnx/sherpa-onnx/csrc/offline-transducer-model.cc:InitDecoder:161 ---decoder---
vocab_size=500
context_size=2

/Users/runner/work/sherpa-onnx/sherpa-onnx/sherpa-onnx/csrc/offline-transducer-model.cc:InitJoiner:185 ---joiner---
joiner_dim=512

Started!
/Users/runner/work/sherpa-onnx/sherpa-onnx/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:108 Creating a resampler:
   in_sample_rate: 8000
   output_sample_rate: 16000

Please make sure you are using the correct model.

Please show the sha256sum of your model files.

csukuangfj commented 1 year ago
(py38) fangjuns-MacBook-Pro:sherpa-onnx-zipformer-en-2023-04-01 fangjun$ shasum -a 256 encoder-epoch-99-avg-1.onnx
7d495012dd5b7ba008143f0c9cb52f3fd97ab0f208923d6a03be0e7db0cd4a4d  encoder-epoch-99-avg-1.onnx
(py38) fangjuns-MacBook-Pro:sherpa-onnx-zipformer-en-2023-04-01 fangjun$ ls -lh encoder-epoch-99-avg-1.onnx
-rw-r--r--  1 fangjun  staff   337M Apr  2 17:43 encoder-epoch-99-avg-1.onnx
(py38) fangjuns-MacBook-Pro:sherpa-onnx-zipformer-en-2023-04-01 fangjun$ ls -l encoder-epoch-99-avg-1.onnx
-rw-r--r--  1 fangjun  staff  353667745 Apr  2 17:43 encoder-epoch-99-avg-1.onnx

Please make sure your output matches the above output.

csukuangfj commented 1 year ago

I suspect that you are using a streaming zipformer but somehow for some unknown reason you put them incorrectly in the folder sherpa-onnx-zipformer-en-2023-04-01/.

lightfate commented 1 year ago

(venv) PS D:\004-Workspace\pycharm\sherpa> CertUtil -hashfile ./sherpa-onnx-zipformer-en-2023-04-01/encoder-epoch-99-avg-1.onnx SHA256 SHA256 的 ./sherpa-onnx-zipformer-en-2023-04-01/encoder-epoch-99-avg-1.onnx 哈希: 709f0ed53a734b7942f170127e7547b566cb29c4afc5e67719f314c3d63ccb10

I use the model: csukuangfj/sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20 (Bilingual, Chinese + English) can i use the bilingual-chinese-english model in offline-decode-files.py

csukuangfj commented 1 year ago

can i use the bilingual-chinese-english model in offline-decode-files.py

No, that model is a streaming model, which can only be used for online decoding.

Please use https://github.com/k2-fsa/sherpa-onnx/blob/master/python-api-examples/online-decode-files.py

lightfate commented 1 year ago

ok,thank you so much~

lonngxiang commented 6 months ago

AssertionError: sample-rate=16000 does not exist!

输入语音文件只能16000采样吗

csukuangfj commented 6 months ago

AssertionError: sample-rate=16000 does not exist!

输入语音文件只能16000采样吗

你把你使用的完整命令,贴出来。

lonngxiang commented 6 months ago

这边保存文件是这个采样 Frame rate: 48000

运行转录文本报错:

python    offline-decode-files.py  --tokens=C:\Users\loong\Downloads\sherpa-onnx-paraformer-trilingual-zh-cantonese-en\sherpa-onnx-paraformer-trilingual-zh-cantonese-en\tokens.txt --paraformer=C:\Users\loong\Downloads\sherpa-onnx-paraformer-trilingual-zh-cantonese-en\sherpa-onnx-paraformer-trilingual-zh-cantonese-en\model.onnx --num-threads=2 --decoding-method=greedy_search --debug=True sample-rate=16000 feature-dim=80 audio.wav
csukuangfj commented 6 months ago

请帖完整的 error log

lonngxiang commented 6 months ago

请帖完整的 error log

需要提前wav文件转换采样率才能用吗


Started!
Traceback (most recent call last):
  File "D:\project\MeloTTS\offline-decode-files.py", line 472, in <module>
    main()
  File "D:\project\MeloTTS\offline-decode-files.py", line 442, in main
    assert_file_exists(wave_filename)
  File "D:\project\MeloTTS\offline-decode-files.py", line 290, in assert_file_exists
    assert Path(filename).is_file(), (
AssertionError: sample-rate=16000 does not exist!
Please refer to https://k2-fsa.github.io/sherpa/onnx/pretrained_models/index.html to download it 
csukuangfj commented 6 months ago
sample-rate=16000 feature-dim=80 audio.wav

是你自己命令输错了

你把

--sample-rate

改成了

sample-rate

你漏掉了 --.

lonngxiang commented 6 months ago
sample-rate=16000 feature-dim=80 audio.wav

是你自己命令输错了

你把

--sample-rate

改成了

sample-rate

你漏掉了 --.

嗯是漏了,但填上怎么没有识别结果输出?

这些都没有打印

results = [s.result.text for s in streams]
    end_time = time.time()
    print(results)
    print("Done!")

    for wave_filename, result in zip(args.sound_files, results):
        print(f"{wave_filename}\n{result}")
        print("-" * 10)

image

lonngxiang commented 6 months ago

还有这种情况保存,换个wav文件


Started!
Traceback (most recent call last):
  File "D:\project\MeloTTS\offline-decode-files.py", line 473, in <module>
    main()
  File "D:\project\MeloTTS\offline-decode-files.py", line 443, in main
    samples, sample_rate = read_wave(wave_filename)
  File "D:\project\MeloTTS\offline-decode-files.py", line 312, in read_wave
    assert f.getsampwidth() == 2, f.getsampwidth()  # it is in bytes
AssertionError: 4
 File "D:\project\MeloTTS\offline-decode-files.py", line 473, in <module>
    main()
  File "D:\project\MeloTTS\offline-decode-files.py", line 443, in main
    samples, sample_rate = read_wave(wave_filename)
  File "D:\project\MeloTTS\offline-decode-files.py", line 310, in read_wave
    with wave.open(wave_filename) as f:
  File "C:\Users\loong\.conda\envs\nlp\lib\wave.py", line 509, in open
    return Wave_read(f)
  File "C:\Users\loong\.conda\envs\nlp\lib\wave.py", line 163, in __init__
    self.initfp(f)
  File "C:\Users\loong\.conda\envs\nlp\lib\wave.py", line 130, in initfp
    raise Error('file does not start with RIFF id')
wave.Error: file does not start with RIFF id
csukuangfj commented 6 months ago

我们只支持 wave 格式。请自己阅读 python 代码,自己调试。

csukuangfj commented 6 months ago

如果你用的是我们的测试音频,没有结果输出,我们可以看。

如果是你自己调用的问题,或者使用错误的音频格式,这个需要你自己解决。

lonngxiang commented 6 months ago

如果你用的是我们的测试音频,没有结果输出,我们可以看。

如果是你自己调用的问题,或者使用错误的音频格式,这个需要你自己解决。

嗯嗯好的

lonngxiang commented 6 months ago
sample-rate=16000 feature-dim=80 audio.wav

是你自己命令输错了 你把

--sample-rate

改成了

sample-rate

你漏掉了 --.

嗯是漏了,但填上怎么没有识别结果输出?

这些都没有打印

results = [s.result.text for s in streams]
    end_time = time.time()
    print(results)
    print("Done!")

    for wave_filename, result in zip(args.sound_files, results):
        print(f"{wave_filename}\n{result}")
        print("-" * 10)

image

但这个正常音频文件没有报错信息能帮看下原因吗,怎么发出来wav看看?

lonngxiang commented 6 months ago

这是音频压缩zip

ezyzip.zip

csukuangfj commented 6 months ago

请把你调用的命令发出来

lonngxiang commented 6 months ago

这个模型 comment=speech_seaco_paraformer_large_asr_nat-zh-cantonese-en-16k-common-vocab11666-pytorch

python    offline-decode-files.py  --tokens=C:\Users\loong\Downloads\sherpa-onnx-paraformer-trilingual-zh-cantonese-en\sherpa-onnx-paraformer-trilingual-zh-cantonese-en\tokens.txt --paraformer=C:\Users\loong\Downloads\sherpa-onnx-paraformer-trilingual-zh-cantonese-en\sherpa-onnx-paraformer-trilingual-zh-cantonese-en\model.onnx --num-threads=2 --decoding-method=greedy_search --debug=True --sample-rate=16000 --feature-dim=80 audio123.wav 
lonngxiang commented 6 months ago

@csukuangfj 另外这两个wav按照1个频道,16000采样还是识别不了,没有报错,开始start后就没有后续直接结束了程序;这边测试了下其他工具可以识别

ezyzip1.zip image

image

csukuangfj commented 6 months ago
Screenshot 2024-03-27 at 10 07 18

我这里是可以的。你那里识别不出来,我就不知道了。


https://huggingface.co/spaces/k2-fsa/automatic-speech-recognition 这个 huggingface space, 也能识别出来。你找找你自己的原因吧。

Screenshot 2024-03-27 at 10 08 58
lonngxiang commented 6 months ago
Screenshot 2024-03-27 at 10 07 18

我这里是可以的。你那里识别不出来,我就不知道了。

https://huggingface.co/spaces/k2-fsa/automatic-speech-recognition 这个 huggingface space, 也能识别出来。你找找你自己的原因吧。

Screenshot 2024-03-27 at 10 08 58

嗯,这么奇怪,我再看看,多谢

lonngxiang commented 6 months ago

debug出来,大概是到这个函数出错,但这个函数就是没有报错信息,也不好进去查看具体实现细节,现在版本升级到最新版也不行 image

lonngxiang commented 6 months ago

在安装的包里没有找到_sherpa_onnx相关文件

from _sherpa_onnx import (
    OfflineRecognizerConfig,
    OfflineStream,

image

lonngxiang commented 6 months ago

在linux上运行,直接报段错误 image

csukuangfj commented 6 months ago

你的 offline_file.py 如何得到的?

你用我们提供的代码,不做任何修改,是否有问题?

lonngxiang commented 6 months ago

你的 offline_file.py 如何得到的?

你用我们提供的代码,不做任何修改,是否有问题?

完全用的官方代码,https://github.com/k2-fsa/sherpa-onnx/blob/master/python-api-examples/offline-decode-files.py

现在不知道是安装包,环境冲突还是,暂时没发现具体产生问题

csukuangfj commented 6 months ago

你用 c++ 编译出来的二进制 sherpa-onnx-offline 去试试?