p0p4k / vits2_pytorch

unofficial vits2-TTS implementation in pytorch
https://arxiv.org/abs/2307.16430
MIT License
477 stars 85 forks source link

RuntimeError: The expanded size of the tensor #92

Closed Moon-sung-woo closed 3 months ago

Moon-sung-woo commented 4 months ago

Hi. I'm training VITS2, and an error occurred as follows. I started training again after deleting the file that had an error, but the error occurred on the other side of the file. Do you happen to know what the problem is?

`Traceback (most recent call last): File "/workspace/vits2/train_ms.py", line 632, in main() File "/workspace/vits2/train_ms.py", line 45, in main mp.spawn( File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 241, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method="spawn") File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 197, in start_processes while not context.join(): File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 158, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error: Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 68, in _wrap fn(i, args) File "/workspace/vits2/train_ms.py", line 272, in run train_and_evaluate( File "/workspace/vits2/train_ms.py", line 361, in train_and_evaluate ) = net_g(x, x_lengths, spec, spec_lengths, speakers) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/distributed.py", line 1523, in forward else self._run_ddp_forward(inputs, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return self.module(*inputs, kwargs) # type: ignore[index] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/workspace/vits2/models.py", line 1297, in forward z_slice, ids_slice = commons.rand_slice_segments( File "/workspace/vits2/commons.py", line 65, in rand_slice_segments ret = slice_segments(x, ids_str, segment_size) File "/workspace/vits2/commons.py", line 55, in slice_segments ret[i] = x[i, :, idx_str:idx_end] RuntimeError: The expanded size of the tensor (22) must match the existing size (3) at non-singleton dimension 1. Target sizes: [192, 22]. Tensor sizes: [192, 3] `

p0p4k commented 3 months ago

Hi, what was the solution to your problem, i am curious...

Moon-sung-woo commented 3 months ago

@p0p4k Hi, i had data smaller then segment size. So i removed the data.

p0p4k commented 3 months ago

Oh right. Thanks!

nicemanis commented 3 months ago

The issue is in TextAudioSpeakerLoader._filter() method: https://github.com/p0p4k/vits2_pytorch/blob/1f4f3790568180f8dec4419d5cad5d0877b034bb/data_utils.py#L259C17-L262C39

The wav length estimation is inaccurate. I fixed it like this:

wav_length = librosa.get_duration(filename=audiopath) * self.sampling_rate
spec_length = wav_length // self.hop_length
if spec_length < self.min_audio_len // self.hop_length:
    print(f"Audio too short: {audiopath}")
    continue