r9y9 / wavenet_vocoder

WaveNet vocoder
https://r9y9.github.io/wavenet_vocoder/
Other
2.31k stars 499 forks source link

how to control upsample scales #169

Open james20141606 opened 5 years ago

james20141606 commented 5 years ago

I use the default setting of [4,4,4,4] in 20180510_mixture_lj_checkpoint_step000320000_ema.json for umsample parameters, and I got an error from

if c is not None and self.upsample_net is not None:
            c = self.upsample_net(c)
            assert c.size(-1) == x.size(-1)

in wavenet.py I print the c and x size out: torch.Size([2, 32, 19968]), torch.Size([2, 1, 9984]) it seems its twice the size, I tried to change the parameters to [2,4,4,4] but it did not work. Or should I change other parameters?

james20141606 commented 5 years ago

by the way I customized some parameters in json file as:

{
  "name": "wavenet_vocoder",
  "builder": "wavenet",
  "input_type": "raw",
  "quantize_channels": 65536,
  "sample_rate": 16000,
  "silence_threshold": 2,
  "num_mels": 32,
  "fmin": 125,
  "fmax": 7600,
  "fft_size": 1024,
  "hop_size": 128,
  "frame_shift_ms": null,
  "min_level_db": -100,
  "ref_level_db": 20,
  "rescaling": true,
  "rescaling_max": 0.999,
  "allow_clipping_in_normalization": true,
  "log_scale_min": -32.23619130191664,
  "out_channels": 30,
  "layers": 24,
  "stacks": 4,
  "residual_channels": 512,
  "gate_channels": 512,
  "skip_out_channels": 256,
  "dropout": 0.050000000000000044,
  "kernel_size": 3,
  "weight_normalization": true,
  "cin_channels": 32,
  "upsample_conditional_features": true,
  "upsample_scales": [
    2,
    4,
    4,
    4
  ],
  "cin_pad": 2,
  "freq_axis_kernel_size": 3,
  "gin_channels": -1,
  "n_speakers": 1,
  "pin_memory": true,
  "num_workers": 2,
  "test_size": 0.0441,
  "test_num_samples": null,
  "random_state": 1234,
  "batch_size": 2,
  "adam_beta1": 0.9,
  "adam_beta2": 0.999,
  "adam_eps": 1e-08,
  "amsgrad": false,
  "initial_learning_rate": 0.001,
  "lr_schedule": "noam_learning_rate_decay",
  "lr_schedule_kwargs": {},
  "nepochs": 2000,
  "weight_decay": 0.0,
  "clip_thresh": -1,
  "max_time_sec": null,
  "max_time_steps": 10000,
  "exponential_moving_average": true,
  "ema_decay": 0.9999,
  "checkpoint_interval": 10000,
  "train_eval_interval": 10000,
  "test_eval_epoch_interval": 5,
  "save_optimizer_state": true
}

Could you help to see what's wrong with the setting?

james20141606 commented 5 years ago

I think I solved it, I found that although I changed the upsample parameters to [2,4,4,4], the train.py did not receive the parameters, so I change the codes in build_model from

upsample_params = hparams.upsample_params
upsample_params["cin_channels"] = hparams.cin_channels
upsample_params["cin_pad"] = hparams.cin_pad

to

upsample_params = hparams.upsample_params
upsample_params["cin_channels"] = hparams.cin_channels
upsample_params["cin_pad"] = hparams.cin_pad
upsample_params['upsample_scales'] = hparams.upsample_scales

and this time the hparams.upsample_params can pass the upsample scale parameters from json file

r9y9 commented 5 years ago

As noted in https://github.com/r9y9/wavenet_vocoder/blob/c0ac05e41f9f563421172034e9398633df172b4f/hparams.py#L75, np.prod(upsample_scales) must be equal to hop_size. This is the reason you got the assertion error.

Looks like you are using an old json file. Top-level upsample_scales doesn't exist anymore (It did in v0.1.1 though)

r9y9 commented 5 years ago

Ah, I haven't updated https://github.com/r9y9/wavenet_vocoder/tree/c0ac05e41f9f563421172034e9398633df172b4f/presets, which may confuse you. I will simply delete them.

james20141606 commented 5 years ago

I used the json file you provided in Hyper params URL in Pre-trained models. Do you mean we do not need the upsample_scales parameters anymore? Could you provide the new json file? I encounted the similar upsample problems when I tried to use trained model to synthesize audio files, it seems that the upsampled c's size(-1) in line 276 in wavenet.py does not match with T

r9y9 commented 5 years ago

For pretrained models, please checkout the specific git commit as noted in README.

james20141606 commented 5 years ago

Yeah I checkout to the specific version while trying synthesis. But for training a new model use my own data I think I kind of mixed the older version with specific version. For the error, for one case, I have a c with size(-1) 1016 and after upsample it's 129536 which ratio is 127.49606299212599, it does not match the hop size 128 I provided. The weird thing is I think I use the same parameters and wavenet.py in my train.py and it also use upsampling and it runs well. I am not sure why the upsample fails in synthesis.py part

james20141606 commented 5 years ago

hey, I'd like to ask again that although the model can be trained smoothly on the specific upsample scale, the model can't be used to synthesize the audio using same json file since the upsample network did not give input audio c exact upsample scales (for me it gives 127.xxxx instead of 128). I am not sure what may cause this problem.

r9y9 commented 5 years ago

https://github.com/r9y9/wavenet_vocoder/blob/8cc0c2dc28b2e7e0e6cafa02995b18be9e955df9/datasets/wavallin.py#L97-L100

If you use our preprocessing script, upsampling is expected to work correctly.

I'm not really sure what you are hitting. You might want to try pdb or ipdb debugging to isolate your problem.

james20141606 commented 4 years ago

Hey, I tried to see what happened to upsample_net, I found that when specifying scales to [2, 4, 4, 4] (which supposed to upsample 128). But during training when I print the c.size(-1),x.size(-1) in wavenet.py before and after line 196, I found that the upsample scales are not 128 (for example: torch.Size([2, 32, 82]) and torch.Size([2, 32, 9984])), but fortunately c.size(-1),x.size(-1) matches

However, during synthesis which using codes in wavenet.py line 275

c = self.upsample_net(c)
assert c.size(-1) == T

this time the upsample_net won't produce c.size(-1) == T

james20141606 commented 4 years ago

I did some further debugging and there are still something confusing me: at first in synthesis.py it seems batch_wavegen function's parameter has some problems when applying it in line 243. then I found that the length mismatch may due to the cin_pad? the cin_pad made len(x)/lem(c) != hopsize. and upsample_net(c) does not produce same length with x. I am not sure how to deal with it.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.