modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
https://www.funasr.com
Other
4.62k stars 514 forks source link

README demo: ValueError: not enough values to unpack (expected 3, got 1) #1632

Open flystar8 opened 2 months ago

flystar8 commented 2 months ago

What is your question?

README demo: ValueError: not enough values to unpack (expected 3, got 1)

Code

from funasr import AutoModel

chunk_size = [0, 10, 5] #[0, 10, 5] 600ms, [0, 8, 4] 480ms encoder_chunk_look_back = 4 #number of chunks to lookback for encoder self-attention decoder_chunk_look_back = 1 #number of encoder chunks to lookback for decoder cross-attention

model = AutoModel(model="paraformer-zh-streaming", model_revision="v2.0.4")

import soundfile import os

wav_file = os.path.join(model.model_path, "/Volumes/workspace/src/openai/audio.wav") speech, sample_rate = soundfile.read(wav_file) chunk_stride = chunk_size[1] * 960 # 600ms

cache = {} total_chunk_num = int(len((speech)-1)/chunk_stride+1) for i in range(total_chunk_num): speech_chunk = speech[ichunk_stride:(i+1)chunk_stride] is_final = i == total_chunk_num - 1 res = model.generate(input=speech_chunk, cache=cache, is_final=is_final, chunk_size=chunk_size, encoder_chunk_look_back=encoder_chunk_look_back, decoder_chunk_look_back=decoder_chunk_look_back) print(res)

res = model.generate(input=speech_chunk, cache=cache, is_final=is_final, chunk_size=chunk_size, encoder_chunk_look_back=encoder_chunk_look_back, decoder_chunk_look_back=decoder_chunk_look_back) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/funasr/auto/auto_model.py", line 229, in generate return self.inference(input, input_len=input_len, cfg) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/funasr/auto/auto_model.py", line 265, in inference res = model.inference(batch, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/funasr/models/paraformer_streaming/model.py", line 546, in inference tokens_i = self.generate_chunk(speech, speech_lengths, key=key, tokenizer=tokenizer, cache=cache, frontend=frontend, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/funasr/models/paraformer_streaming/model.py", line 418, in generate_chunk encoder_out, encoder_out_lens = self.encode_chunk(speech, speech_lengths, cache=cache, is_final=kwargs.get("is_final", False)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/funasr/models/paraformer_streaming/model.py", line 167, in encode_chunk encoder_out, encoder_outlens, = self.encoder.forward_chunk(speech, speech_lengths, cache=cache["encoder"]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/funasr/models/scama/encoder.py", line 437, in forward_chunk xs_pad = self.embed(xs_pad, cache) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/funasr/models/transformer/embedding.py", line 430, in forward batch_size, timesteps, input_dim = x.size() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ValueError: not enough values to unpack (expected 3, got 1)

LauraGPT commented 2 months ago

Check the format of wav, via, soxi