yl4579 / AuxiliaryASR

Joint CTC-S2S Phoneme-level ASR for Voice Conversion and TTS (Text-Mel Alignment)
MIT License
111 stars 30 forks source link

Error Message: RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (1024, 1024) at dimension 2 of input [1, 65621, 2] #11

Open GUUser91 opened 9 months ago

GUUser91 commented 9 months ago

@yl4579 I added a extra line to the train_list.txt file and got this error message:

python train.py --config_path ./Configs/config.yml {'max_lr': 0.0005, 'pct_start': 0.0, 'epochs': 200, 'steps_per_epoch': 72} ctc_linear.2.linear_layer.weight does not have same shape torch.Size([178, 256]) torch.Size([80, 256]) ctc_linear.2.linear_layer.bias does not have same shape torch.Size([178]) torch.Size([80]) asr_s2s.embedding.weight does not have same shape torch.Size([178, 512]) torch.Size([80, 256]) asr_s2s.project_to_n_symbols.weight does not have same shape torch.Size([178, 128]) torch.Size([80, 128]) asr_s2s.project_to_n_symbols.bias does not have same shape torch.Size([178]) torch.Size([80]) asr_s2s.decoder_rnn.weight_ih does not have same shape torch.Size([512, 640]) torch.Size([512, 384])

Traceback (most recent call last): File "/home/bud/AuxiliaryASR/train.py", line 116, in main() File "/home/bud/AuxiliaryASR/venv/lib/python3.10/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) File "/home/bud/AuxiliaryASR/venv/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/home/bud/AuxiliaryASR/venv/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/bud/AuxiliaryASR/venv/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(args, kwargs) File "/home/bud/AuxiliaryASR/train.py", line 98, in main train_results = trainer._train_epoch() File "/home/bud/AuxiliaryASR/trainer.py", line 186, in _train_epoch for train_steps_per_epoch, batch in enumerate(tqdm(self.train_dataloader, desc="[train]"), 1): File "/home/bud/AuxiliaryASR/venv/lib/python3.10/site-packages/tqdm/std.py", line 1182, in iter for obj in iterable: File "/home/bud/AuxiliaryASR/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 630, in next data = self._next_data() File "/home/bud/AuxiliaryASR/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1345, in _next_data return self._process_data(data) File "/home/bud/AuxiliaryASR/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1371, in _process_data data.reraise() File "/home/bud/AuxiliaryASR/venv/lib/python3.10/site-packages/torch/_utils.py", line 694, in reraise raise exception RuntimeError: Caught RuntimeError in DataLoader worker process 2. Original Traceback (most recent call last): File "/home/bud/AuxiliaryASR/venv/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop data = fetcher.fetch(index) File "/home/bud/AuxiliaryASR/venv/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/bud/AuxiliaryASR/venv/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/bud/AuxiliaryASR/meldataset.py", line 65, in getitem mel_tensor = self.to_melspec(wave_tensor) File "/home/bud/AuxiliaryASR/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/bud/AuxiliaryASR/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "/home/bud/AuxiliaryASR/venv/lib/python3.10/site-packages/torchaudio/transforms/_transforms.py", line 619, in forward specgram = self.spectrogram(waveform) File "/home/bud/AuxiliaryASR/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/bud/AuxiliaryASR/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, **kwargs) File "/home/bud/AuxiliaryASR/venv/lib/python3.10/site-packages/torchaudio/transforms/_transforms.py", line 110, in forward return F.spectrogram( File "/home/bud/AuxiliaryASR/venv/lib/python3.10/site-packages/torchaudio/functional/functional.py", line 126, in spectrogram spec_f = torch.stft( File "/home/bud/AuxiliaryASR/venv/lib/python3.10/site-packages/torch/functional.py", line 648, in stft input = F.pad(input.view(extended_shape), [pad, pad], pad_mode) RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (1024, 1024) at dimension 2 of input [1, 65621, 2]

yl4579 commented 9 months ago

Which line did you add? Does this error only happen when you have your extra line?

GUUser91 commented 9 months ago

@yl4579 I just convert a wav file from this youtube video down to a 1 second 24000khz wav file. https://www.youtube.com/watch?v=GQt4SY_6-w4. Here is the converted wav file https://files.catbox.moe/2jfn9u.wav I also get the same error message if I add any more lines and wav files from this page. Note I also convert them to 24000 khz after downloading these wav files. https://huggingface.co/Bluebomber182/Judy-Hopps/tree/main/audio

korakoe commented 4 months ago

As I've found out... All files need to be mono! You can do this by adding:

# Convert to mono if needed
if np.ndim(wave) > 1:
    wave = np.mean(wave, axis=1)

Under sf.read() in _load_tensor()