Closed vasusharma closed 1 month ago
Hello,
Have you pinpointed which file is causing the problem? The errors seem to suggest that the input waveform has length 0, so my first guess would be that maybe some files have corrupted audio streams.
For completeness's sake, I'm using torch 2.0.1 with torchaudio 2.0.2 if that helps.
I am having issues using Mavil and Av-Hubert, likely in torchaudio. When i try to extract features with model as 'mavil_base' and 'avhubert_fusion' (using torchaudio version: 2.3.1) I get the following errors:
Mavil:
File "/fsx-ust/vasusharma/envs/av/lib/python3.9/site-packages/torchaudio/compliance/kaldi.py", line 142, in _get_waveform_and_window_properties assert 2 <= window_size <= len(waveform), "choose a window size {} that is [2, {}]".format( AssertionError: choose a window size 1200 that is [2, 0]
AV-Hubert: File "/fsx-ust/vasusharma/envs/av/lib/python3.9/site-packages/torchaudio/functional/functional.py", line 1462, in _apply_sinc_resample_kernel waveform = waveform.view(-1, shape[-1]) RuntimeError: cannot reshape tensor of 0 elements into shape [-1, 0] because the unspecified dimension size -1 can be any value and is ambiguous
Any ideas what could be going wrong?