Disable gradients for null conditioning when CFG is enabled

lucasnewman commented 9 months ago

I had a training run loss blow up to NaN after a while with conditional drop enabled, and it looks like the null conditioning had gradients enabled and would eventually overflow from debugging the model weights. This just disables gradients for the null conditioning parameter. I verified the loss still converges as expected.

I also included a couple of drive-by fixes (let me know if you want them in another PR):

Fixed a couple of undefined variable references
Pinned the fractional mask tensor to the accelerator device
Added an option to decode to EnCodec quantized codes instead of audio / latents, which allows using e.g. the multi-band diffusion decoder from Meta.

lucidrains commented 9 months ago

@lucasnewman that's really interesting learned null conditioning led to instability! i'll have to think about that one

rest lgtm!

lucidrains commented 9 months ago

may be seeing synergy between gateloop and attention layers (combined green run actually has less parameters than either the gateloop or attention run alone)

recommend giving that a try!

lucidrains / voicebox-pytorch

Disable gradients for null conditioning when CFG is enabled #37