roger-tseng / av-superb

A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)
https://av.superbbenchmark.org/
Other
44 stars 4 forks source link

Issue with torchaudio #3

Closed vasusharma closed 1 month ago

vasusharma commented 1 month ago

I am having issues using Mavil and Av-Hubert, likely in torchaudio. When i try to extract features with model as 'mavil_base' and 'avhubert_fusion' (using torchaudio version: 2.3.1) I get the following errors:

Mavil:

File "/fsx-ust/vasusharma/envs/av/lib/python3.9/site-packages/torchaudio/compliance/kaldi.py", line 142, in _get_waveform_and_window_properties assert 2 <= window_size <= len(waveform), "choose a window size {} that is [2, {}]".format( AssertionError: choose a window size 1200 that is [2, 0]

AV-Hubert: File "/fsx-ust/vasusharma/envs/av/lib/python3.9/site-packages/torchaudio/functional/functional.py", line 1462, in _apply_sinc_resample_kernel waveform = waveform.view(-1, shape[-1]) RuntimeError: cannot reshape tensor of 0 elements into shape [-1, 0] because the unspecified dimension size -1 can be any value and is ambiguous

Any ideas what could be going wrong?

roger-tseng commented 1 month ago

Hello,

Have you pinpointed which file is causing the problem? The errors seem to suggest that the input waveform has length 0, so my first guess would be that maybe some files have corrupted audio streams.

For completeness's sake, I'm using torch 2.0.1 with torchaudio 2.0.2 if that helps.