Training error - Githubissues

Eternity231 commented 2 years ago

I just run train.py and got this error INFO:baker_base:{'train': {'log_interval': 200, 'eval_interval': 10000, 'seed': 1234, 'epochs': 20000, 'learning_rate': 0.0002, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 16, 'fp16_run': True, 'lr_decay': 0.999875, 'segment_size': 8192, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0}, 'data': {'training_files': 'filelists/baker_train.txt', 'validation_files': 'filelists/baker_valid.txt', 'max_wav_value': 32768.0, 'sampling_rate': 16000, 'filter_length': 1024, 'hop_length': 256, 'win_length': 1024, 'n_mel_channels': 80, 'mel_fmin': 0.0, 'mel_fmax': None, 'add_blank': True, 'n_speakers': 0}, 'model': {'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0.1, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [8, 8, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 4, 4], 'n_layers_q': 3, 'use_spectral_norm': False}, 'model_dir': './logs\baker_base'} WARNING:baker_base:E:\vits\ is not a git repository, therefore hash value comparison will be ignored. INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. Traceback (most recent call last): File "train.py", line 294, in main() File "train.py", line 50, in main mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,)) File "C:\Python38\lib\site-packages\torch\multiprocessing\spawn.py", line 240, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "C:\Python38\lib\site-packages\torch\multiprocessing\spawn.py", line 198, in start_processes while not context.join(): File "C:\Python38\lib\site-packages\torch\multiprocessing\spawn.py", line 160, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error: Traceback (most recent call last): File "C:\Python38\lib\site-packages\torch\multiprocessing\spawn.py", line 69, in _wrap fn(i, *args) File "E:\vits\train.py", line 119, in run train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler, [train_loader, eval_loader], logger, [writer, writer_eval]) File "E:\vits\train.py", line 139, in train_and_evaluate for batch_idx, (x, x_lengths, spec, spec_lengths, y, y_lengths) in enumerate(train_loader): File "C:\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 530, in next data = self._next_data() File "C:\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 1224, in _next_data return self._process_data(data) File "C:\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 1250, in _process_data data.reraise() File "C:\Python38\lib\site-packages\torch_utils.py", line 457, in reraise raise exception IndexError: Caught IndexError in DataLoader worker process 0. Original Traceback (most recent call last): File "C:\Python38\lib\site-packages\torch\utils\data_utils\worker.py", line 287, in _worker_loop data = fetcher.fetch(index) File "C:\Python38\lib\site-packages\torch\utils\data_utils\fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "C:\Python38\lib\site-packages\torch\utils\data_utils\fetch.py", line 49, in data = [self.dataset[idx] for idx in possibly_batched_index] File "E:\vits\data_utils.py", line 90, in getitem return self.get_audio_text_pair(self.audiopaths_and_text[index]) File "E:\vits\data_utils.py", line 61, in get_audio_text_pair spec, wav = self.get_audio(audiopath) File "E:\vits\data_utils.py", line 67, in get_audio raise ValueError("{} {} SR doesn't match target {} SR".format( IndexError: Replacement index 2 out of range for positional args tuple Can anyone help me?

lexkoro commented 2 years ago

You have different sample rates in your audio files.

There is also an error in the code https://github.com/jaywalnut310/vits/blob/main/data_utils.py#L68

Remove the first {} in raise ValueError("{} {} SR doesn't match target {} SR".format(sampling_rate, self.sampling_rate))

Eternity231 commented 2 years ago

I remove it and got this INFO:baker_base:{'train': {'log_interval': 200, 'eval_interval': 10000, 'seed': 1234, 'epochs': 20000, 'learning_rate': 0.0002, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 16, 'fp16_run': True, 'lr_decay': 0.999875, 'segment_size': 8192, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0}, 'data': {'training_files': 'filelists/baker_train.txt', 'validation_files': 'filelists/baker_valid.txt', 'max_wav_value': 32768.0, 'sampling_rate': 16000, 'filter_length': 1024, 'hop_length': 256, 'win_length': 1024, 'n_mel_channels': 80, 'mel_fmin': 0.0, 'mel_fmax': None, 'add_blank': True, 'n_speakers': 0}, 'model': {'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0.1, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [8, 8, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 4, 4], 'n_layers_q': 3, 'use_spectral_norm': False}, 'model_dir': './logs\baker_base'} WARNING:baker_base:E:\vits\ is not a git repository, therefore hash value comparison will be ignored. INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. Traceback (most recent call last): File "train.py", line 294, in main() File "train.py", line 50, in main mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,)) File "C:\Python38\lib\site-packages\torch\multiprocessing\spawn.py", line 240, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "C:\Python38\lib\site-packages\torch\multiprocessing\spawn.py", line 198, in start_processes while not context.join(): File "C:\Python38\lib\site-packages\torch\multiprocessing\spawn.py", line 160, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error: Traceback (most recent call last): File "C:\Python38\lib\site-packages\torch\multiprocessing\spawn.py", line 69, in _wrap fn(i, *args) File "E:\vits\train.py", line 119, in run train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler, [train_loader, eval_loader], logger, [writer, writer_eval]) File "E:\vits\train.py", line 139, in train_and_evaluate for batch_idx, (x, x_lengths, spec, spec_lengths, y, y_lengths) in enumerate(train_loader): File "C:\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 530, in next data = self._next_data() File "C:\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 1224, in _next_data return self._process_data(data) File "C:\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 1250, in _process_data data.reraise() File "C:\Python38\lib\site-packages\torch_utils.py", line 457, in reraise raise exception ValueError: Caught ValueError in DataLoader worker process 0. Original Traceback (most recent call last): File "C:\Python38\lib\site-packages\torch\utils\data_utils\worker.py", line 287, in _worker_loop data = fetcher.fetch(index) File "C:\Python38\lib\site-packages\torch\utils\data_utils\fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "C:\Python38\lib\site-packages\torch\utils\data_utils\fetch.py", line 49, in data = [self.dataset[idx] for idx in possibly_batched_index] File "E:\vits\data_utils.py", line 90, in getitem return self.get_audio_text_pair(self.audiopaths_and_text[index]) File "E:\vits\data_utils.py", line 61, in get_audio_text_pair spec, wav = self.get_audio(audiopath) File "E:\vits\data_utils.py", line 67, in get_audio raise ValueError(" {} SR doesn't match target {} SR".format( ValueError: 48000 SR doesn't match target 16000 SR

lexkoro commented 2 years ago

ValueError: 48000 SR doesn't match target 16000 SR

You have a mismatch in sample rate

UltimateAmitieKaiNiC commented 2 years ago

CUDA error: device-side assert triggered

Eternity231 commented 2 years ago

-- Process 0 terminated with the following error: Traceback (most recent call last): File "C:\Python38\lib\site-packages\torch\multiprocessing\spawn.py", line 69, in _wrap fn(i, *args) File "E:\vits\train.py", line 119, in run train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler, [train_loader, eval_loader], logger, [writer, writer_eval]) File "E:\vits\train.py", line 192, in train_and_evaluate scaler.scale(loss_gen_all).backward() File "C:\Python38\lib\site-packages\torch_tensor.py", line 363, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "C:\Python38\lib\site-packages\torch\autograd__init__.py", line 173, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: view_as_complex is only supported for float and double tensors, but got a tensor of scalar type: Half i use torch 1.11.0is torch make this error?

lexkoro commented 2 years ago

yes, try this fix https://github.com/jaywalnut310/vits/pull/34

or downgrade

jaywalnut310 / vits

Training error #74