m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
BSD 2-Clause "Simplified" License
12.51k stars 1.32k forks source link

RuntimeError: Calculated padded input size per channel #373

Open monological opened 1 year ago

monological commented 1 year ago

I ran this before with no issues, but now it's failing.

File "./venv/lib/python3.10/site-packages/whisperx/alignment.py", line 191, in align
emissions, _ = model(waveform_segment.to(device))
File "./venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1502, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "./venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _call_impl
return forward_call(*args, *kwargs)
File "./venv/lib/python3.10/site-packages/torchaudio/models/wav2vec2/model.py", line 116, in forward
x, lengths = self.feature_extractor(waveforms, lengths)
File "./venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1502, in _wrapped_call_impl
return self._call_impl(
args,
kwargs)
File "./venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _call_impl
return forward_call(*args, kwargs)
File "./venv/lib/python3.10/site-packages/torchaudio/models/wav2vec2/components.py", line 141, in forward
x, length = layer(x, length) # (batch, feature, frame)
File "./venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1502, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "./venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _call_impl
return forward_call(
args,
kwargs)
File "./venv/lib/python3.10/site-packages/torchaudio/models/wav2vec2/components.py", line 90, in forward
x = self.conv(x)
File "./venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1502, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "./venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _call_impl
return forward_call(
args, **kwargs)
File "./venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 313, in forward
return self._conv_forward(input, self.weight, self.bias)
File "./venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 309, in _conv_forward
return F.conv1d(input, weight, bias, self.stride,
RuntimeError: Calculated padded input size per channel: (1). Kernel size: (2). Kernel size can't be greater than actual input size

Oheed911 commented 1 year ago

yes facing the same issue, this issue is coming on some specific files does anyone knows the possible reason.

Talhazeb commented 1 year ago

Same here, some files are facing this issue.

Oheed911 commented 1 year ago

@m-bain, please look into this issue too, can't find the solution to this.

pranavbhat12 commented 1 year ago

@m-bain facing this same issue for some files.Can you help us to find the solution for this?

m-bain commented 1 year ago

Check that file is single channel .wav

monological commented 1 year ago

Yes the file is a single channel .wav fileOn Aug 15, 2023, at 4:44 PM, Max Bain @.***> wrote: Check that file is single channel .wav

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

MahmoudAshraf97 commented 1 year ago

should be solved by https://github.com/m-bain/whisperX/pull/510