custom training - Githubissues

p0p4k / vits2_pytorch

unofficial vits2-TTS implementation in pytorch

https://arxiv.org/abs/2307.16430

MIT License

471 stars 84 forks source link

custom training #7

Closed icklerly1 closed 1 year ago

icklerly1 commented 1 year ago

Hi,

I have prepared my trainings data as the ljspeech dataset. I copied the vits2_ljs_base.yaml and changed the paths for train and val.

Now I want to train with the command:

python train.py -c configs/vits2_ljs_base_de.json -m vits2_ljs_base_de

Unfortunately I get the following error message: Do you have an advice ?

Loading train data: 0%| | 0/1417 [00:13<?, ?it/s] Traceback (most recent call last): File "train.py", line 336, in main() File "train.py", line 51, in main mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,)) File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes while not context.join(): File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error: Traceback (most recent call last): File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap fn(i, *args) File "/opt/8tbdrive1/experiments/vits2_pytorch/train.py", line 156, in run train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler, [train_loader, eval_loader], logger, [writer, writer_eval]) File "/opt/8tbdrive1/experiments/vits2_pytorch/train.py", line 181, in train_and_evaluate if net_g.use_noise_scaled_mas: File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 947, in getattr raise AttributeError("'{}' object has no attribute '{}'".format( AttributeError: 'DistributedDataParallel' object has no attribute 'use_noise_scaled_mas

p0p4k commented 1 year ago

Hi, thanks for letting me know the bug. Please try the latest patch and give me your feedback.

icklerly1 commented 1 year ago

@p0p4k thanks :) I am now a step further. I had to disable fp16 but now it seems like we are almost there. Unfortunately now I get the following error. Do you know what is going wrong here?

Loading train data: 0%| | 0/1417 [00:14<?, ?it/s] Traceback (most recent call last): File "train.py", line 347, in main() File "train.py", line 51, in main mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,)) File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes while not context.join(): File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error: Traceback (most recent call last): File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap fn(i, args) File "/opt/8tbdrive1/experiments/vits_copy/vits2_pytorch/train.py", line 161, in run train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler, [train_loader, eval_loader], logger, [writer, writer_eval]) File "/opt/8tbdrive1/experiments/vits_copy/vits2_pytorch/train.py", line 247, in train_and_evaluate scaler.step(optim_g) File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/cuda/amp/grad_scaler.py", line 304, in step return optimizer.step(args, kwargs) File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 65, in wrapper return wrapped(*args, *kwargs) File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/optim/optimizer.py", line 89, in wrapper return func(args, kwargs) File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/optim/adamw.py", line 110, in step F.adamw(params_with_grad, File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/optim/_functional.py", line 128, in adamw expavg.mul(beta1).add_(grad, alpha=1 - beta1) RuntimeError: The size of tensor a (195) must match the size of tensor b (196) at non-singleton dimension 0

p0p4k commented 1 year ago

Maybe delete any previously made checkpoints and start fresh and see. If that doesn't work then delete the Mel specs and let the model calculate them again. Would suggest training with nosdp config. Good luck!

icklerly1 commented 1 year ago

I am trying now the training with train_ms.py and configs/vits2_vctk_base_de.json, Unfortunately I get the following error.. do you have any idea what is going wrong here?

Loading train data: 0%| | 0/3752 [00:10<?, ?it/s] Traceback (most recent call last): File "train_ms.py", line 346, in main() File "train_ms.py", line 51, in main mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,)) File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes while not context.join(): File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error: Traceback (most recent call last): File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap fn(i, *args) File "/opt/8tbdrive1/experiments/vits_copy/vits2_pytorch/train_ms.py", line 157, in run train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler, [train_loader, eval_loader], logger, [writer, writer_eval]) File "/opt/8tbdrive1/experiments/vits_copy/vits2_pytorch/train_ms.py", line 181, in train_and_evaluate for batch_idx, (x, x_lengths, spec, spec_lengths, y, y_lengths, speakers) in enumerate(loader): File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/tqdm/std.py", line 1182, in iter for obj in iterable: File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 517, in next data = self._next_data() File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data return self._process_data(data) File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data data.reraise() File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/_utils.py", line 429, in reraise raise self.exc_type(msg) AttributeError: Caught AttributeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop data = fetcher.fetch(index) File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/opt/8tbdrive1/experiments/vits_copy/vits2_pytorch/data_utils.py", line 275, in getitem return self.get_audio_text_speaker_pair(self.audiopaths_sid_text[index]) File "/opt/8tbdrive1/experiments/vits_copy/vits2_pytorch/data_utils.py", line 221, in get_audio_text_speaker_pair spec, wav = self.get_audio(audiopath) File "/opt/8tbdrive1/experiments/vits_copy/vits2_pytorch/data_utils.py", line 234, in get_audio if self.use_mel_spec_posterior: AttributeError: 'TextAudioSpeakerLoader' object has no attribute 'use_mel_spec_posterior'

p0p4k commented 1 year ago

Hi, try the latest code. I fixed it in the last patch. Report to me if any error. Thanks.

icklerly1 commented 1 year ago

I did update the code. I also deleted the old training folder in logs. Now I get the following error:

INFO:root:Added key: store_based_barrier_key:1 to store for rank: 0 Using mel posterior encoder for VITS2 Using transformer flows pre_conv for VITS2 Using noise scaled MAS for VITS2 NOT using any duration discriminator like VITS1 256 2 Loading train data: 0%| | 0/3752 [00:12<?, ?it/s] Traceback (most recent call last): File "train_ms.py", line 415, in main() File "train_ms.py", line 53, in main mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,)) File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes while not context.join(): File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error: Traceback (most recent call last): File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap fn(i, args) File "/opt/8tbdrive1/experiments/vits_copy/vits2_pytorch/train_ms.py", line 191, in run train_and_evaluate(rank, epoch, hps, [net_g, net_d, net_dur_disc], [optim_g, optim_d, optim_dur_disc], [scheduler_g, scheduler_d, scheduler_dur_disc], scaler, [train_loader, eval_loader], logger, [writer, writer_eval]) File "/opt/8tbdrive1/experiments/vits_copy/vits2_pytorch/train_ms.py", line 231, in train_and_evaluate (z, z_p, m_p, logs_p, m_q, logs_q), (hiddenx, logw, logw) = net_g(x, x_lengths, spec, spec_lengths, speakers) File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(input, kwargs) File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 705, in forward output = self.module(*inputs[0], *kwargs[0]) File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(input, kwargs) File "/opt/8tbdrive1/experiments/vits_copy/vits2_pytorch/models.py", line 852, in forward z, m_q, logs_q, y_mask = self.enc_q(y, y_lengths, g=g) File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, kwargs) File "/opt/8tbdrive1/experiments/vits_copy/vits2_pytorch/models.py", line 597, in forward x = self.pre(x) x_mask File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(input, kwargs) File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 263, in forward return self._conv_forward(input, self.weight, self.bias) File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 259, in _conv_forward return F.conv1d(input, weight, bias, self.stride, RuntimeError: Given groups=1, weight of size [192, 80, 1], expected input[8, 513, 272] to have 80 channels, but got 513 channels instead

p0p4k commented 1 year ago

Should work in the latest patch. Sorry for silly errors!

icklerly1 commented 1 year ago

thanks for your quick help :) Now I get another error:

Traceback (most recent call last): File "train_ms.py", line 415, in main() File "train_ms.py", line 53, in main mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,)) File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes while not context.join(): File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error: Traceback (most recent call last): File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap fn(i, *args) File "/opt/8tbdrive1/experiments/vits_copy/vits2_pytorch/train_ms.py", line 191, in run train_and_evaluate(rank, epoch, hps, [net_g, net_d, net_dur_disc], [optim_g, optim_d, optim_dur_disc], [scheduler_g, scheduler_d, scheduler_dur_disc], scaler, [train_loader, eval_loader], logger, [writer, writer_eval]) File "/opt/8tbdrive1/experiments/vits_copy/vits2_pytorch/train_ms.py", line 220, in train_and_evaluate for batch_idx, (x, x_lengths, spec, spec_lengths, y, y_lengths, speakers) in enumerate(loader): File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/tqdm/std.py", line 1182, in iter for obj in iterable: File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 517, in next data = self._next_data() File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1179, in _next_data return self._process_data(data) File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data data.reraise() File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/_utils.py", line 429, in reraise raise self.exc_type(msg) AttributeError: Caught AttributeError in DataLoader worker process 7. Original Traceback (most recent call last): File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop data = fetcher.fetch(index) File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/opt/8tbdrive1/experiments/vits2_pytorch/venv/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/opt/8tbdrive1/experiments/vits_copy/vits2_pytorch/data_utils.py", line 277, in getitem return self.get_audio_text_speaker_pair(self.audiopaths_sid_text[index]) File "/opt/8tbdrive1/experiments/vits_copy/vits2_pytorch/data_utils.py", line 223, in get_audio_text_speaker_pair spec, wav = self.get_audio(audiopath) File "/opt/8tbdrive1/experiments/vits_copy/vits2_pytorch/data_utils.py", line 253, in get_audio self.win_length, self.hparams.mel_fmin, self.hparams.mel_fmax, center=False) AttributeError: 'TextAudioSpeakerLoader' object has no attribute 'hparams'

p0p4k commented 1 year ago

Try again and let me know.

icklerly1 commented 1 year ago

I have started a training now.. but the results are weird right now. Super short. Just some mumble sounds.. I have also noticed that I get errors in the dataset because of symbols, e.g. 阳 even though this letter is not in my data set. I have checked to be sure about it. Do you have an explanation why this error was shown?

p0p4k commented 1 year ago

What was the error? Are you training for Chinese language? Did you modify the symbol file with appropriate symbols?

JohnHerry commented 11 months ago

I have started a training now.. but the results are weird right now. Super short. Just some mumble sounds.. I have also noticed that I get errors in the dataset because of symbols, e.g. 阳 even though this letter is not in my data set. I have checked to be sure about it. Do you have an explanation why this error was shown?

Did you train with directly Chinese character as input instead of pinyin? How much is your training dataset?