Open chongxian opened 2 weeks ago
when I use this command, the loss is nan,how to solve this problem? Thanks for your help
the datasets is small,just 290 images, but loss is nan,I try to set the mixed_precision=bf16 and t5xxl_dtype =bf16,but these settings don't work ,the loss is also nan
t5xxl_dtype=bf16
t5xxl_dtype=bf16
I try this setting,but it doesn't work
Your loss is equal to nan in the initial stage of training. This should be caused by fp16 precision. Set mixed_precision=bf16, and then do not declare t5xxl_dtype.
Your loss is equal to nan in the initial stage of training. This should be caused by fp16 precision. Set mixed_precision=bf16, and then do not declare t5xxl_dtype.
It doesn't work ,the loss is nan
I solve the problem now,but this problem may be the bug of train code
Please remove *_sd3_te.npz
files in the training directory, when changing the mixed precision or t5xxl_dtype. It recreates cache files.