lmnt-com / diffwave

DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.
Apache License 2.0
767 stars 112 forks source link

Regular sampling and fast sampling not equivalent in unconditional generation #27

Closed gzhu06 closed 2 years ago

gzhu06 commented 2 years ago

Hi, thank you so much for your implementation.

I trained one unconditional generator, the fast sampling makes sense during inference using default noise schedule, like this:

Screen Shot 2022-05-02 at 8 34 15 PM

However, when I set fast_sampling to False, still with the default noise schedule, I got this:

Screen Shot 2022-05-02 at 8 35 20 PM

Is this normal? Thanks in advance.

Also is this setting correct? The maximum beta in two schedules are different here.

noise_schedule=np.linspace(1e-4, 0.05, 50).tolist()
inference_noise_schedule=[0.0001, 0.001, 0.01, 0.05, 0.2, 0.5]
sharvil commented 2 years ago

It's unusual to get such a bad waveform when using fast_sampling=False, especially when it looks reasonable when fast sampling is enabled. I personally haven't trained an unconditional generator so I'm not sure if this issue is widespread or specific to your scenario. Maybe @Andrechang has some pointers -- they added the unconditional synthesis implementation to this repository.

The noise settings you listed look correct to me, and the maximum betas are expected to be significantly different.

Andrechang commented 2 years ago

It was quite sometime ago, but from what I recall, the unconditional generator wasn't good with fast_sampling=False I didn't change the default params.py other than batch_size to fit in gpu memory

gzhu06 commented 2 years ago

Thanks!

I see, but this is quite weird to me, because according to their paper, fast sampling is an approximation of the original version. Maybe increasing sampling steps and changing noise schedule could help?

Andrechang commented 2 years ago

I think it is worth a try to change sampling steps and changing noise schedules to see if it improves