lucidrains / DALLE2-pytorch

Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch
MIT License
11.03k stars 1.07k forks source link

Fix spikes in prior training #50

Closed rom1504 closed 2 years ago

rom1504 commented 2 years ago

https://wandb.ai/laion/diffusion-prior/runs/3o0ic6ou?workspace=user-krish240574

Screenshot_20220502_172341

I figure it might be better after gradient cliping can you rerun with the latest commits @krish240574 ?

rom1504 commented 2 years ago

still happening after gradient cliping actually, but apparently it doesn't impact perf so maybe it's ok

lucidrains commented 2 years ago

@rom1504 added an option that may or may not help https://github.com/lucidrains/DALLE2-pytorch/blob/main/train_diffusion_prior.py#L185

lucidrains commented 2 years ago

@rom1504 added a few more stability measure, including two layernorms from the normformer paper, corroborated by @borisdayma 's experiments https://twitter.com/borisdayma/status/1517227191477571585

rom1504 commented 2 years ago

This is now solved in the latest runs from @krish240574 ; thanks for the updates !