Open james20141606 opened 5 years ago
by the way I customized some parameters in json file as:
{
"name": "wavenet_vocoder",
"builder": "wavenet",
"input_type": "raw",
"quantize_channels": 65536,
"sample_rate": 16000,
"silence_threshold": 2,
"num_mels": 32,
"fmin": 125,
"fmax": 7600,
"fft_size": 1024,
"hop_size": 128,
"frame_shift_ms": null,
"min_level_db": -100,
"ref_level_db": 20,
"rescaling": true,
"rescaling_max": 0.999,
"allow_clipping_in_normalization": true,
"log_scale_min": -32.23619130191664,
"out_channels": 30,
"layers": 24,
"stacks": 4,
"residual_channels": 512,
"gate_channels": 512,
"skip_out_channels": 256,
"dropout": 0.050000000000000044,
"kernel_size": 3,
"weight_normalization": true,
"cin_channels": 32,
"upsample_conditional_features": true,
"upsample_scales": [
2,
4,
4,
4
],
"cin_pad": 2,
"freq_axis_kernel_size": 3,
"gin_channels": -1,
"n_speakers": 1,
"pin_memory": true,
"num_workers": 2,
"test_size": 0.0441,
"test_num_samples": null,
"random_state": 1234,
"batch_size": 2,
"adam_beta1": 0.9,
"adam_beta2": 0.999,
"adam_eps": 1e-08,
"amsgrad": false,
"initial_learning_rate": 0.001,
"lr_schedule": "noam_learning_rate_decay",
"lr_schedule_kwargs": {},
"nepochs": 2000,
"weight_decay": 0.0,
"clip_thresh": -1,
"max_time_sec": null,
"max_time_steps": 10000,
"exponential_moving_average": true,
"ema_decay": 0.9999,
"checkpoint_interval": 10000,
"train_eval_interval": 10000,
"test_eval_epoch_interval": 5,
"save_optimizer_state": true
}
Could you help to see what's wrong with the setting?
I think I solved it, I found that although I changed the upsample parameters to [2,4,4,4], the train.py did not receive the parameters, so I change the codes in build_model from
upsample_params = hparams.upsample_params
upsample_params["cin_channels"] = hparams.cin_channels
upsample_params["cin_pad"] = hparams.cin_pad
to
upsample_params = hparams.upsample_params
upsample_params["cin_channels"] = hparams.cin_channels
upsample_params["cin_pad"] = hparams.cin_pad
upsample_params['upsample_scales'] = hparams.upsample_scales
and this time the hparams.upsample_params can pass the upsample scale parameters from json file
As noted in https://github.com/r9y9/wavenet_vocoder/blob/c0ac05e41f9f563421172034e9398633df172b4f/hparams.py#L75, np.prod(upsample_scales)
must be equal to hop_size
. This is the reason you got the assertion error.
Looks like you are using an old json file. Top-level upsample_scales
doesn't exist anymore (It did in v0.1.1 though)
Ah, I haven't updated https://github.com/r9y9/wavenet_vocoder/tree/c0ac05e41f9f563421172034e9398633df172b4f/presets, which may confuse you. I will simply delete them.
I used the json file you provided in Hyper params URL in Pre-trained models. Do you mean we do not need the upsample_scales parameters anymore? Could you provide the new json file? I encounted the similar upsample problems when I tried to use trained model to synthesize audio files, it seems that the upsampled c's size(-1) in line 276 in wavenet.py does not match with T
For pretrained models, please checkout the specific git commit as noted in README.
Yeah I checkout to the specific version while trying synthesis. But for training a new model use my own data I think I kind of mixed the older version with specific version. For the error, for one case, I have a c with size(-1) 1016 and after upsample it's 129536 which ratio is 127.49606299212599, it does not match the hop size 128 I provided. The weird thing is I think I use the same parameters and wavenet.py in my train.py and it also use upsampling and it runs well. I am not sure why the upsample fails in synthesis.py part
hey, I'd like to ask again that although the model can be trained smoothly on the specific upsample scale, the model can't be used to synthesize the audio using same json file since the upsample network did not give input audio c exact upsample scales (for me it gives 127.xxxx instead of 128). I am not sure what may cause this problem.
If you use our preprocessing script, upsampling is expected to work correctly.
I'm not really sure what you are hitting. You might want to try pdb or ipdb debugging to isolate your problem.
Hey, I tried to see what happened to upsample_net, I found that when specifying scales to [2, 4, 4, 4] (which supposed to upsample 128). But during training when I print the c.size(-1),x.size(-1)
in wavenet.py before and after line 196, I found that the upsample scales are not 128 (for example: torch.Size([2, 32, 82]) and torch.Size([2, 32, 9984])), but fortunately c.size(-1),x.size(-1) matches
However, during synthesis which using codes in wavenet.py line 275
c = self.upsample_net(c)
assert c.size(-1) == T
this time the upsample_net won't produce c.size(-1) == T
I did some further debugging and there are still something confusing me: at first in synthesis.py it seems batch_wavegen function's parameter has some problems when applying it in line 243. then I found that the length mismatch may due to the cin_pad? the cin_pad made len(x)/lem(c) != hopsize. and upsample_net(c) does not produce same length with x. I am not sure how to deal with it.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I use the default setting of [4,4,4,4] in 20180510_mixture_lj_checkpoint_step000320000_ema.json for umsample parameters, and I got an error from
in wavenet.py I print the c and x size out: torch.Size([2, 32, 19968]), torch.Size([2, 1, 9984]) it seems its twice the size, I tried to change the parameters to [2,4,4,4] but it did not work. Or should I change other parameters?