First of all, I appreciate to @jianfch for time stamped whisper
but I'm face with the error while decoding below
Traceback (most recent call last):
File "test.py", line 23, in
result = model.transcribe({file_path})
File "{some_path}/python3.8/site-packages/stable_whisper/whisper_word_level.py", line 351, in transcribe_stable
mel_segment = log_mel_spectrogram(audio_segment)
File "{some_path}/lib/python3.8/site-packages/whisper/audio.py", line 138, in log_mel_spectrogram
stft = torch.stft(audio, N_FFT, HOP_LENGTH, window=window, return_complex=True)
File "{some_path}/lib/python3.8/site-packages/torch/functional.py", line 604, in stft
input = F.pad(input.view(extended_shape), [pad, pad], pad_mode)
RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (200, 200) at dimension 2 of input [1, 1, 8]
if you guys happen to notice me about this errors, let me know some hint thank you.
First of all, I appreciate to @jianfch for time stamped whisper but I'm face with the error while decoding below
Traceback (most recent call last): File "test.py", line 23, in
result = model.transcribe({file_path})
File "{some_path}/python3.8/site-packages/stable_whisper/whisper_word_level.py", line 351, in transcribe_stable
mel_segment = log_mel_spectrogram(audio_segment)
File "{some_path}/lib/python3.8/site-packages/whisper/audio.py", line 138, in log_mel_spectrogram
stft = torch.stft(audio, N_FFT, HOP_LENGTH, window=window, return_complex=True)
File "{some_path}/lib/python3.8/site-packages/torch/functional.py", line 604, in stft
input = F.pad(input.view(extended_shape), [pad, pad], pad_mode)
RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (200, 200) at dimension 2 of input [1, 1, 8]
if you guys happen to notice me about this errors, let me know some hint thank you.