Ran create_data.py on a bunch of tv show audio with subtitles then run_finetuning.py and getting this.
Traceback (most recent call last):
File "/home/me/whisper-finetuning/run_finetuning.py", line 297, in <module>
main()
File "/home/me/whisper-finetuning/run_finetuning.py", line 286, in main
main_loop(
File "/home/me/whisper-finetuning/run_finetuning.py", line 209, in main_loop
min_loss = evaluate(model, dev_loader)
File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/me/whisper-finetuning/run_finetuning.py", line 165, in evaluate
for x, y_in, y_out in tqdm(dev_loader):
File "/opt/conda/lib/python3.10/site-packages/tqdm/std.py", line 1178, in __iter__
for obj in iterable:
File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 633, in __next__
data = self._next_data()
File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1325, in _next_data
return self._process_data(data)
File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1371, in _process_data
data.reraise()
File "/opt/conda/lib/python3.10/site-packages/torch/_utils.py", line 644, in reraise
raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 1.
Original Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/me/whisper-finetuning/dataloader.py", line 149, in __getitem__
mel = self._calculate_mel(record.audio_path, next_partial_segment_start, no_timestamps)
File "/home/me/whisper-finetuning/dataloader.py", line 105, in _calculate_mel
mel = log_mel_spectrogram(audio_path)
File "/opt/conda/lib/python3.10/site-packages/whisper/audio.py", line 138, in log_mel_spectrogram
stft = torch.stft(audio, N_FFT, HOP_LENGTH, window=window, return_complex=True)
File "/opt/conda/lib/python3.10/site-packages/torch/functional.py", line 639, in stft
input = F.pad(input.view(extended_shape), [pad, pad], pad_mode)
RuntimeError: 2D or 3D (batch mode) tensor expected for input, but got: [ torch.FloatTensor{1,1,0} ]
Once again thanks for creating this project.
Can't seem to get past this.
Ran create_data.py on a bunch of tv show audio with subtitles then run_finetuning.py and getting this.
Any help would be greatly appreciated. Thanks.