pyannote / pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
http://pyannote.github.io
MIT License
5.48k stars 725 forks source link

`torchaudio.info.num_frames` can give wrong results so it can provide false exceptions #1724

Open grazder opened 2 weeks ago

grazder commented 2 weeks ago

Tested versions

current master

System information

linux

Issue description

I get here wrong num_frames on m4a opus-files

https://github.com/pyannote/pyannote-audio/blob/cd3f550d00ea6bfb155dc7aef17e4f9c2516ee55/pyannote/audio/core/io.py#L280

Repro and described error can be founded here: https://github.com/pytorch/audio/issues/3731

It's better not to trust this method. For example, I've got minumum dataset item length of ~ 121s, but the minimum torchaudio_info(...).num_frames is 5674 which is obviously fake.

Minimal reproduction example (MRE)

https://github.com/pytorch/audio/issues/3731

grazder commented 2 weeks ago

https://github.com/pytorch/audio/issues/3573

hbredin commented 2 weeks ago

Thanks for the report. What solution do you suggest?