modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
https://www.funasr.com
Other
6.66k stars 710 forks source link

请问有没有paraformer实时和vad实时的一体的gpu调用方法,以及vad录音输入问题 #1997

Open EvilCalf opened 2 months ago

EvilCalf commented 2 months ago

同时我单独使用vad推理,利用sounddevice从麦克风读取数据,用以下

sample_rate = 16000  # 采样率
channels = 1  # 单声道
dtype = "int16"  # 数据类型
blocksize = 1024  # 块大小

def record_audio():
    with sd.InputStream(
        samplerate=sample_rate,
        channels=channels,
        dtype=dtype,
        blocksize=blocksize,
        callback=send_audio,
    ):
        print("开始录音...")
        while should_stop_asr.empty():
            pass  # 主循环,让程序持续运行
        print("停止录音...")

def send_audio(indata, frames, time, status):
    if status:
        print(status)
    send_audio_queue.put(indata)

这里我取到的数据我直接使用把数据作为speech_chunk一直输出的value只有[{'key': 'rand_key_DFUCd6ZAFDChf', 'value': []}]

 res = vad_mod.generate(
      input=audio_bytes,
      cache=cache,
      chunk_size=chunk_size,
      disable_pbar=True,
      fs=16000,
  )

但是我用如下就会soundfile.LibsndfileError: Error opening <_io.BytesIO object at 0x7f27f007de50>: Format not recognised.

audio_buffer = io.BytesIO(audio_bytes)
audio_data, samplerate = sf.read(audio_buffer)
EvilCalf commented 2 months ago
        audio_bytes = phone_datas.input_audio_queue.get()
        print(type(audio_bytes))
        audio_data = np.frombuffer(audio_bytes, dtype=np.int16)
        # if len(audio_data.shape) > 1:
        #     audio_data = audio_data[:, 0]
        # audio_array = np.frombuffer(audio_bytes, dtype=np.float64)
        # print(audio_data)
        # sf.write("output.wav", audio_data, 16000)
        # audio_data, samplerate = sf.read("output.wav")
        phone_datas.all_input_bytes_list.append(audio_bytes)

        # bytes_to_audio(phone_datas.all_input_bytes_list)
        # audio_data, samplerate = sf.read(audio_data)
        res = vad_mod.generate(
            input=audio_data,
            cache=cache,
            chunk_size=8000,
            disable_pbar=True,
        )

我这里获取到的audio_data可以用soundfile保存,也能播放有文字,但是就是value一直为[]

EvilCalf commented 2 months ago

录音保存到文件可以,但是录音的数据直接使用就不行