modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
https://www.funasr.com
Other
6.99k stars 744 forks source link

PCM录音文件读取似乎有问题 #2207

Open wjm030612 opened 5 days ago

wjm030612 commented 5 days ago

在jetson上,做了离线部署,docker映射文件路径后,录音文件格式是pcm,读取好像有问题,能告诉我怎么修改吗,以下是我的代码和报错信息

Key Conformer already exists in model_classes, re-register
Key Linear already exists in adaptor_classes, re-register
Key TransformerDecoder already exists in decoder_classes, re-register
Key LightweightConvolutionTransformerDecoder already exists in decoder_classes, re-register
Key LightweightConvolution2DTransformerDecoder already exists in decoder_classes, re-register
Key DynamicConvolutionTransformerDecoder already exists in decoder_classes, re-register
Key DynamicConvolution2DTransformerDecoder already exists in decoder_classes, re-register
funasr version: 1.1.14.
  0%|                                                     | 0/1 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/funasr/utils/load_utils.py", line 93, in load_audio_text_image_video
    data_or_path_or_list, audio_fs = torchaudio.load(data_or_path_or_list)
  File "/usr/local/lib/python3.10/dist-packages/torchaudio/_backend/utils.py", line 205, in load
    return backend.load(uri, frame_offset, num_frames, normalize, channels_first, format, buffer_size)
  File "/usr/local/lib/python3.10/dist-packages/torchaudio/_backend/ffmpeg.py", line 297, in load
    return load_audio(uri, frame_offset, num_frames, normalize, channels_first, format)
  File "/usr/local/lib/python3.10/dist-packages/torchaudio/_backend/ffmpeg.py", line 88, in load_audio
    s = torchaudio.io.StreamReader(src, format, None, buffer_size)
  File "/usr/local/lib/python3.10/dist-packages/torio/io/_streaming_media_decoder.py", line 526, in __init__
    self._be = ffmpeg_ext.StreamingMediaDecoder(os.path.normpath(src), format, option)
RuntimeError: Failed to open the input "/home/work1/cache/asr.pcm" (Invalid data found when processing input).

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/funasr/utils/load_utils.py", line 213, in _load_audio_ffmpeg
    out = run(cmd, capture_output=True, check=True).stdout
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ffmpeg', '-nostdin', '-threads', '0', '-i', '/home/work1/cache/asr.pcm', '-f', 's16le', '-ac', '1', '-acodec', 'pcm_s16le', '-ar', '16000', '-']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/work1/asr_test02.py", line 13, in <module>
    res = model.generate(input=file)
  File "/usr/local/lib/python3.10/dist-packages/funasr/auto/auto_model.py", line 301, in generate
    return self.inference(input, input_len=input_len, **cfg)
  File "/usr/local/lib/python3.10/dist-packages/funasr/auto/auto_model.py", line 343, in inference
    res = model.inference(**batch, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/funasr/models/paraformer_streaming/model.py", line 586, in inference
    audio_sample_list = load_audio_text_image_video(
  File "/usr/local/lib/python3.10/dist-packages/funasr/utils/load_utils.py", line 72, in load_audio_text_image_video
    return [
  File "/usr/local/lib/python3.10/dist-packages/funasr/utils/load_utils.py", line 73, in <listcomp>
    load_audio_text_image_video(
  File "/usr/local/lib/python3.10/dist-packages/funasr/utils/load_utils.py", line 97, in load_audio_text_image_video
    data_or_path_or_list = _load_audio_ffmpeg(data_or_path_or_list, sr=fs)
  File "/usr/local/lib/python3.10/dist-packages/funasr/utils/load_utils.py", line 215, in _load_audio_ffmpeg
    raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e
RuntimeError: Failed to load audio: ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers
  built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/aarch64-linux-gnu --incdir=/usr/include/aarch64-linux-gnu --arch=arm64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
  libavutil      56. 70.100 / 56. 70.100
  libavcodec     58.134.100 / 58.134.100
  libavformat    58. 76.100 / 58. 76.100
  libavdevice    58. 13.100 / 58. 13.100
  libavfilter     7.110.100 /  7.110.100
  libswscale      5.  9.100 /  5.  9.100
  libswresample   3.  9.100 /  3.  9.100
  libpostproc    55.  9.100 / 55.  9.100
/home/work1/cache/asr.pcm: Invalid data found when processing input

  0%|                                                     | 0/1 [00:00<?, ?it/s]
from funasr import AutoModel
import time
import soundfile
import os

model = AutoModel(model="/home/work1/asr_model/",
        device="cuda",
        disable_update=True)
file = os.path.join("/home/work1/cache/asr.pcm")
# speech, sample_rate = soundfile.read(wav_file)

a = time.time()
res = model.generate(input=file)
print(time.time() - a)
print(res)

我已经安装了apt和pip的ffmpeg包

goddamnVincent commented 4 hours ago

change pcm to wav