jianfch / stable-ts

Transcription, forced alignment, and audio indexing with OpenAI's Whisper
MIT License
1.61k stars 178 forks source link

decoding error about padding #117

Closed Macsim2 closed 1 year ago

Macsim2 commented 1 year ago

First of all, I appreciate to @jianfch for time stamped whisper but I'm face with the error while decoding below

Traceback (most recent call last): File "test.py", line 23, in result = model.transcribe({file_path}) File "{some_path}/python3.8/site-packages/stable_whisper/whisper_word_level.py", line 351, in transcribe_stable mel_segment = log_mel_spectrogram(audio_segment) File "{some_path}/lib/python3.8/site-packages/whisper/audio.py", line 138, in log_mel_spectrogram stft = torch.stft(audio, N_FFT, HOP_LENGTH, window=window, return_complex=True) File "{some_path}/lib/python3.8/site-packages/torch/functional.py", line 604, in stft input = F.pad(input.view(extended_shape), [pad, pad], pad_mode) RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (200, 200) at dimension 2 of input [1, 1, 8]

if you guys happen to notice me about this errors, let me know some hint thank you.

jianfch commented 1 year ago

Should be fixed in the latest version.

Macsim2 commented 1 year ago

@jianfch thank you, I solved this problem