Open lareina-a opened 3 months ago
python train.py -c ckpt/config.json -m mymodel
INFO:mymodel:{'train': {'log_interval': 1000, 'eval_interval': 10000, 'save_interval': 10000, 'seed': 1234, 'epochs': 1000, 'optimizer': 'adamw', 'lr_decay_on': True, 'learning_rate': 5e-05, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 32, 'fp16_run': False, 'lr_decay': 0.999875, 'segment_size': 35840, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 1, 'aug': True, 'lambda_commit': 0.02}, 'data': {'sampling_rate': 16000, 'filter_length': 1280, 'hop_length': 320, 'win_length': 1280, 'n_mel_channels': 80, 'mel_fmin': 0, 'mel_fmax': 8000}, 'model': {'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0.1, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [5, 4, 4, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [11, 8, 8, 4, 4], 'mixup_ratio': 0.6, 'n_layers_q': 3, 'use_spectral_norm': False, 'hidden_size': 128}, 'diffusion': {'dec_dim': 64, 'spk_dim': 128, 'beta_min': 0.05, 'beta_max': 20.0}, 'model_dir': '/workspace/raid/ha0/logs_diffhier/mymodel'}
WARNING:mymodel:/root/autodl-tmp/Diff-HierVC-master/utils is not a git repository, therefore hash value comparison will be ignored.
INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0
INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
Traceback (most recent call last):
File "train.py", line 275, in
-- Process 0 terminated with the following error: Traceback (most recent call last): File "/root/miniconda3/envs/diff/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap fn(i, *args) File "/root/autodl-tmp/Diff-HierVC-master/train.py", line 68, in run train_dataset = AudioDataset(hps, training=True) File "/root/autodl-tmp/Diff-HierVC-master/utils/data_loader.py", line 22, in init self.filelist_path = config.data.train_filelist_path \ AttributeError: 'HParams' object has no attribute 'train_filelist_path' Can you help me and tell me how to solve it?
Hello, The filelist path should be specified in the config. The config for training is similar to the ckpt but with the file path section added and updated (config/config.json). An example of the file list folder is as follows:
Each text file should include the paths for the wav, F0 and norm F0 files.
Thank you.
wav、F0 和规范 F0 文件
Hello,
Thank you for your response. Could you please let me know which dataset you are using? Is it possible to share it? Additionally, do we need to generate the wav, F0, and normalized F0 files ourselves?
Thank you.
Yes, the wav files you have to collect. There are lots of datasets online. Then, use something like crepe to generate the pitch embedding from them. It seems like it expects 2 dimension pitch embeddings, so I'm just unsqeezing a new zero dimension and hoping that works. Then you collect the mean and std dev and zscore them for the normalized pitch embeddings.
All that seems to mostly work, keeping in mind there is a minimum length the wavs have to be based on the segment size.
I still hit an issue though - during evaluation, it fails in the encoder because the pitch embedding mask somehow ends up being like twice the length of the pitch embedding, and when they are multiplied together it fails. Have not been able to figure out what is wrong yet. Perhaps unsqueezing a dimension for the pitch embedding is not correct, and the pitch embedding is supposed to be some different 2 dimension structure.
During evaluation, the length of the pitch embedding mask does not match the length of the pitch embedding itself, leading to a failure. How can this issue be resolved? If you have a solution, could you please share it?
I have no clue. All I can guess is that the pitch embeddings are supposed to be in some format that I don't know.
I made some changes that get it past evaluation, but then it dies with a similar issue in training.
Obviously, the code is wrong, or the data is, so I'm guessing it's the pitch embeddings.
This code is based on the GradTTS code, like a couple dozen other voice conversion models, and typically, I haven't had much of an issue with the pitch embeddings in some of these other models, so I don't know whats up.
As described in the paper, we use F0 information with four times higher resolution compared to Mel. Therefore, the F0 mask is four times longer than the Mel segment mask. Since the hop size is 320, we used segment length // 80 in the data loader.
Yeah, that's my fault. I haven't read the paper in months. Thank you.
Thanks for making the training code available, by the way! I'm really looking forward to playing with this model.
Just have to produce about 700,000 new pitch embeddings. I'm only using 2 RTX 3090s, so I'm sure I have quite a bit of training time to go through.
I converted the diffusion model in GradSVC to use diffusers (discrete time steps) and latent space to dramatically speed up training and use use diffusers schedulers so I may take a look at that here, perhaps, but if its not relatively straight forward to reuse that, I'll probably just eat the super long time.
You're conducting interesting work! I plan to use a diffuser as well. I have used YAAPT, but if you have a large amount of data for training, I recommend using a relatively fast pitch extractor like Parselmouth for real-time extraction!
Modified the data loader, had my f0 files as .npy files so f0 = torch.load(f0_path) was not working, used
f0 = torch.from_numpy(np.load(f0_path))
f0 = torch.unsqueeze(f0, 0) # to match the dimensions
instead.
Now,,
max_f0_start = f0.shape[-1] - self.segment_length//80
is giving the max_f0_start as a negative value in some cases, and therefore I'm getting this error -
File "/home/aditya/Diff_Hier_VC/Diff-HierVC/train.py", line 283, in
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/root/miniconda3/envs/dhvc/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/home/aditya/Diff_Hier_VC/Diff-HierVC/train.py", line 125, in run
train_and_evaluate(rank, epoch, hps, [model, mel_fn, w2v, aug, net_v], optimizer,
File "/home/aditya/Diff_Hier_VC/Diff-HierVC/train.py", line 144, in train_and_evaluate
for batch_idx, (x, norm_f0, x_f0, length) in enumerate(train_loader):
File "/root/miniconda3/envs/dhvc/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 530, in next
data = self._next_data()
File "/root/miniconda3/envs/dhvc/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1224, in _next_data
return self._process_data(data)
File "/root/miniconda3/envs/dhvc/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1250, in _process_data
data.reraise()
File "/root/miniconda3/envs/dhvc/lib/python3.10/site-packages/torch/_utils.py", line 457, in reraise
raise exception
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/root/miniconda3/envs/dhvc/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/root/miniconda3/envs/dhvc/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/root/miniconda3/envs/dhvc/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 49, in
any fixes?
I have found it