rotary embedding issues when training in mixed precision

zqevans commented 8 months ago

I'm using the ContinuousTransformerWrapper with rotary positional embeddings for latent audio diffusion, and hit an issue where the audio quality significantly drops after 2048 latent tokens on a transformer with embed dim of 1536. I'm training using mixed precision, and I believe that's causing an issue with the application of the rotary embeddings.

Temporarily casting to float32 when applying the rotary embeds should hopefully fix this, as seen crowsonkb's code here.

That aside, this library has been fantastic to work with. Thanks for your work!

lucidrains commented 8 months ago

@zqevans Zach! thanks for reporting this; do let me know if the latest patch resolves the issue

also, congratulations on StableAudio! 🎶

zqevans commented 8 months ago

Thanks! I'll try this out and let you know.

The x-transformers backbone is giving us our best diffusion models yet by far, I've basically moved off of our previous U-Net code entirely. Hoping to share more results soon!

lucidrains / x-transformers

rotary embedding issues when training in mixed precision #210