k2-fsa / sherpa-onnx

Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter, Object Pascal, Lazarus, Rust
https://k2-fsa.github.io/sherpa/onnx/index.html
Apache License 2.0
3.66k stars 425 forks source link

The input tensor cannot be reshaped to the requested shape #1519

Closed isgat closed 2 weeks ago

isgat commented 2 weeks ago

Hi!

Model: sherpa-onnx-zipformer-ru-2024-09-18 (Russian, 俄语) Wav File: Test.wav

I get this error if the file is longer than about 20 seconds.

Error:

[E:onnxruntime:, sequential_executor.cc:514 onnxruntime::ExecuteKernel] Non-zero status code returned while running Reshape node. Name:'/encoder/encoders.0/layers.0/self_attn_weights/Reshape_3' Status Message: C:\a\_work\1\s\onnxruntime\core\providers\cpu\tensor\reshape_helper.h:40 onnxruntime::ReshapeHelper::ReshapeHelper size != 0 && (input_shape_size % size) == 0 was false. The input tensor cannot be reshaped to the requested shape. Input shape:{1,1295,16}, requested shape:{-1,4589,4,4}

When decoding files provided as an example in the repository, everything is fine. I tried it with the compiled version on Windows and with C# API. I get the same errors.

Long files are not supported? What should I do?

csukuangfj commented 2 weeks ago

could you use a vad to segment your input file?

isgat commented 2 weeks ago

Thanks! It worked! I just didn't know about a vad. Made according to the example.