modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
https://www.funasr.com
Other
5.99k stars 647 forks source link

双通道处理存在维度问题? #712

Closed lingfengchencn closed 1 year ago

lingfengchencn commented 1 year ago

加载双通道音频

import soundfile
speech,sample_rate = soundfile.read(wav_path[2])
speech_length = speech.shape[0]
for s in speech:
    print(s)

输出

[-0.0007019  -0.00036621]
[ 0.00241089 -0.00021362]
[ 7.08007812e-03 -9.15527344e-05]
[ 0.00650024 -0.00036621]
[ 0.0015564  -0.00073242]
[-0.00170898 -0.00057983]
[-1.52587891e-03  9.15527344e-05]
[-0.00161743  0.00061035]
[-0.0043335   0.00061035]

使用 Fsmn_vad_online model处理, 报错

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[77], line 16
     14     is_final = False
     15 param_dict['is_final'] = is_final
---> 16 segments_result = model_vad_online(audio_in=speech[sample_offset: sample_offset + step],
     17                         param_dict=param_dict)
     18 if segments_result:
     19     print(segments_result)

File [~/miniconda3/envs/asr/lib/python3.10/site-packages/funasr_onnx/vad_bin.py:244](https://vscode-remote+ssh-002dremote-002b106-002e14-002e181-002e44.vscode-resource.vscode-cdn.net/ai/FunASR/~/miniconda3/envs/asr/lib/python3.10/site-packages/funasr_onnx/vad_bin.py:244), in Fsmn_vad_online.__call__(self, audio_in, **kwargs)
    242 param_dict = kwargs.get('param_dict', dict())
    243 is_final = param_dict.get('is_final', False)
--> 244 feats, feats_len = self.extract_feat(waveforms, is_final)
    245 segments = []
    246 if feats.size != 0:

File [~/miniconda3/envs/asr/lib/python3.10/site-packages/funasr_onnx/vad_bin.py:290](https://vscode-remote+ssh-002dremote-002b106-002e14-002e181-002e44.vscode-resource.vscode-cdn.net/ai/FunASR/~/miniconda3/envs/asr/lib/python3.10/site-packages/funasr_onnx/vad_bin.py:290), in Fsmn_vad_online.extract_feat(self, waveforms, is_final)
    287 for idx, waveform in enumerate(waveforms):
    288     waveforms_lens[idx] = waveform.shape[-1]
--> 290 feats, feats_len = self.frontend.extract_fbank(waveforms, waveforms_lens, is_final)
    291 # feats.append(feat)
    292 # feats_len.append(feat_len)
    293 
    294 # feats = self.pad_feats(feats, np.max(feats_len))
...
    218 # update self.in_cache

File <__array_function__ internals>:5, in concatenate(*args, **kwargs)
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 3 dimension(s)
hnluo commented 1 year ago

Vad model only supports single channel audio input,you can choose one channel for inference。

GlocKieHuan commented 1 year ago

I think that you could merge the two channels into a single channel after infer.

lingfengchencn commented 1 year ago

I think that you could merge the two channels into a single channel after infer.

yeah, but it will be hard to split two speakers. in this case, channel1 is speaker1,channel2 is speaker2. after the merge, I need to verifiacte them by speaker verification

GlocKieHuan commented 1 year ago

你好呀,这里是GH的邮件自动回复,你的邮件我已经收到啦,祝好~~~