The model converges slows and collapse after 1M training

Hi, Thanks for releasing this exciting work! I'm trying to train the 256x256 unconditional model from scratch, everything seems to be going well under 1M iters training, but after 1M iters, the sFID of the model has not been able to drop below 9, and the model sometimes collapse--the sFID increase to more than 20. I ran exactly as the codebase instructed:

MODEL_FLAGS="--attention_resolutions 32,16,8 --image_size 256 --num_channels 256 --class_cond False --diffusion_steps 1000 --learn_sigma True --noise_schedule linear --num_head_channels 64 --num_res_blocks 2 --resblock_updown True --use_fp16 False --use_scale_shift_norm True"
TRAIN_FLAGS="--batch_size 32 --lr 1e-4 --save_interval 10000 --log_interval 100 --weight_decay 0.05 --use_checkpoint True"

Is there anything wrong, or I missed something?

openai / guided-diffusion

The model converges slows and collapse after 1M training #89