yufan-aslp / AliMeeting

The project is associated with the recently-launched ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge (M2MeT) to provide participants with baseline systems for speech recognition and speaker diarization in conference scenario.
112 stars 17 forks source link

Inference error on custom file having 2 speakers. #1

Open saumyaborwankar opened 2 years ago

saumyaborwankar commented 2 years ago
filenames: ['aaaa']
Finished the feature extracting (12921856, 2)

  0%|          | 0/174 [00:00<?, ?it/s]
  0%|          | 0/174 [00:00<?, ?it/s]
INFO:__main__:End:   Processing file aaaa: Elapsed: 1.611116647720337 seconds
Traceback (most recent call last):
  File "VBx/predict.py", line 176, in <module>
    fea = features.fbank_htk(seg, window, noverlap, fbank_mx, USEPOWER=True, ZMEANSOURCE=True)
  File "/hdd/saumya/AliMeeting/speaker/VBx/features.py", line 101, in fbank_htk
    x *= window
ValueError: operands could not be broadcast together with shapes (182,400,2) (400,) (182,400,2) 
# Accounting: time=5 threads=1
# Ended (code 1) at Thu Nov 18 16:16:36 IST 2021, elapsed time 5 seconds

path: data/test/dia_part/exp/extract_embedding.1.log While inferencing on file 'aaaa.wav' the script fails at extracting embeddings. Can someone help?

yufan-aslp commented 2 years ago

You may use two channels of audio data. Our model only supports single channel audio.

saumyaborwankar commented 2 years ago

aaaa sox /hdd/saumya/AliMeeting/speaker/wav_16/aaaa.wav -t wav - remix 1 |

This is the content of my wav.scp