pyannote / pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
http://pyannote.github.io
MIT License
6.41k stars 789 forks source link

Diarization pipeline fails at end of audio file (RuntimeError: Sizes of tensors must match except in dimension 0.) #1752

Open ccmilne opened 3 months ago

ccmilne commented 3 months ago

Tested versions

System information

Ubuntu 22.04.4 LTS - pyannote.audio 3.3.1 - EC2 g5.4xlarge

Issue description

Receiving this error when running the diarization pipeline on an mp3 file:

RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 160000 but got size 147200 for tensor number 12 in the list.

Code to reproduce:

image

audio file can be found on Supreme Court's website: https://www.supremecourt.gov/oral_arguments/audio/2023/23-334

Full error:

image

Minimal reproduction example (MRE)

https://colab.research.google.com/drive/1odeZBhMTI7Ku4umLZ12VJkqrlVk0MRLk?usp=sharing

qalabeabbas49 commented 2 months ago

Hi, I am not sure but try converting mp3 to wav and trying again.

ccmilne commented 2 months ago

Hi, I am not sure but try converting mp3 to wav and trying again.

Converting to a WAV file worked. Not sure why, but thanks!

qalabeabbas49 commented 2 months ago

Hi, I am not sure but try converting mp3 to wav and trying again.

Converting to a WAV file worked. Not sure why, but thanks!

It has something to do with torachaudio backend. Sometimes it doesn't work well with mp3 format.