A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
2024-05-14 11:09:35,110 - modelscope - INFO - PyTorch version 2.3.0 Found.
2024-05-14 11:09:35,110 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2024-05-14 11:09:35,135 - modelscope - INFO - Loading done! Current index file version is 1.14.0, with md5 7f17021ca099dd6760d43c7a9e69c36a and a total number of 976 components indexed
Detect model requirements, begin to install it: /root/.cache/modelscope/hub/Qwen/Qwen-Audio/requirements.txt
install model requirements successfully
WARNING:transformers_modules.Qwen-Audio.modeling_qwen:The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
WARNING:transformers_modules.Qwen-Audio.modeling_qwen:Try importing flash-attention for faster inference...
WARNING:transformers_modules.Qwen-Audio.modeling_qwen:Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
WARNING:transformers_modules.Qwen-Audio.modeling_qwen:Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
WARNING:transformers_modules.Qwen-Audio.modeling_qwen:Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 13.09it/s]
audio_start_id: 155163, audio_end_id: 155164, audio_pad_id: 151851.
2024-05-14 11:09:42,213 - modelscope - WARNING - Using the master branch is fragile, please use it with caution!
2024-05-14 11:09:42,213 - modelscope - INFO - Use user-specified model revision: master
ckpt: /root/.cache/modelscope/hub/iic/speech_fsmn_vad_zh-cn-16k-common-pytorch/model.pt
rtf_avg: 0.019: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 9.60it/s]
0%| | 0/1 [00:00<?, ?it/sTraceback (most recent call last): | 0/1 [00:00<?, ?it/s]
File "/root/.cache/huggingface/modules/transformers_modules/Qwen-Audio/audio.py", line 91, in load_audio
out = run(cmd, capture_output=True, check=True).stdout
File "/root/miniconda3/envs/funasr/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ffmpeg', '-nostdin', '-threads', '0', '-i', 'tensor([-0.0001, -0.0002, 0.0007, ..., 0.0000, 0.0000, 0.0000])', '-f', 's16le', '-ac', '1', '-acodec', 'pcm_s16le', '-ar', '16000', '-']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "qwen_demo.py", line 18, in
res = model.generate(input=audio_in, prompt=prompt, batch_size_s=0,)
File "/root/miniconda3/envs/funasr/lib/python3.8/site-packages/funasr/auto/auto_model.py", line 248, in generate
return self.inference_with_vad(input, input_len=input_len, cfg)
File "/root/miniconda3/envs/funasr/lib/python3.8/site-packages/funasr/auto/auto_model.py", line 394, in inference_with_vad
results = self.inference(
File "/root/miniconda3/envs/funasr/lib/python3.8/site-packages/funasr/auto/auto_model.py", line 285, in inference
res = model.inference(batch, **kwargs)
File "/root/miniconda3/envs/funasr/lib/python3.8/site-packages/funasr/models/qwen_audio/model.py", line 66, in inference
audio_info = self.tokenizer.process_audio(query)
File "/root/.cache/huggingface/modules/transformers_modules/Qwen-Audio/tokenization_qwen.py", line 556, in process_audio
audio = load_audio(audio_path)
File "/root/.cache/huggingface/modules/transformers_modules/Qwen-Audio/audio.py", line 93, in load_audio
raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e
RuntimeError: Failed to load audio: ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
libavutil 56. 31.100 / 56. 31.100
libavcodec 58. 54.100 / 58. 54.100
libavformat 58. 29.100 / 58. 29.100
libavdevice 58. 8.100 / 58. 8.100
libavfilter 7. 57.100 / 7. 57.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 5.100 / 5. 5.100
libswresample 3. 5.100 / 3. 5.100
libpostproc 55. 5.100 / 55. 5.100
tensor([-0.0001, -0.0002, 0.0007, ..., 0.0000, 0.0000, 0.0000]): No such file or directory
🐛 Bug
qwen-audio + vad 运行报错
To Reproduce
python qwen_demo.py
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "qwen_demo.py", line 18, in
res = model.generate(input=audio_in, prompt=prompt, batch_size_s=0,)
File "/root/miniconda3/envs/funasr/lib/python3.8/site-packages/funasr/auto/auto_model.py", line 248, in generate
return self.inference_with_vad(input, input_len=input_len, cfg)
File "/root/miniconda3/envs/funasr/lib/python3.8/site-packages/funasr/auto/auto_model.py", line 394, in inference_with_vad
results = self.inference(
File "/root/miniconda3/envs/funasr/lib/python3.8/site-packages/funasr/auto/auto_model.py", line 285, in inference
res = model.inference(batch, **kwargs)
File "/root/miniconda3/envs/funasr/lib/python3.8/site-packages/funasr/models/qwen_audio/model.py", line 66, in inference
audio_info = self.tokenizer.process_audio(query)
File "/root/.cache/huggingface/modules/transformers_modules/Qwen-Audio/tokenization_qwen.py", line 556, in process_audio
audio = load_audio(audio_path)
File "/root/.cache/huggingface/modules/transformers_modules/Qwen-Audio/audio.py", line 93, in load_audio
raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e
RuntimeError: Failed to load audio: ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
libavutil 56. 31.100 / 56. 31.100
libavcodec 58. 54.100 / 58. 54.100
libavformat 58. 29.100 / 58. 29.100
libavdevice 58. 8.100 / 58. 8.100
libavfilter 7. 57.100 / 7. 57.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 5.100 / 5. 5.100
libswresample 3. 5.100 / 3. 5.100
libpostproc 55. 5.100 / 55. 5.100
tensor([-0.0001, -0.0002, 0.0007, ..., 0.0000, 0.0000, 0.0000]): No such file or directory
0%| | 0/1 [00:00<?, ?it/s] 0%| | 0/1 [00:00<?, ?it/s]
Environment