Closed yeonsikch closed 11 months ago
That is indeed very weird. Could you confirm if this happens for the SD under same settings?
Does this also happen when you train for a smaller number of steps?
That is indeed very weird. Could you confirm if this happens for the SD under same settings?
Does this also happen when you train for a smaller number of steps?
I didn't train for a smaller number of steps yet. But I'm training SD1.5 now. After training 210k steps, I'll reply.
Thanks!
EDIT: The below is probably unrelated to the above issue, but I'll keep it up in case anyone runs into a similar situation and finds this. It turns out it was because I added in another model (frozen version of the original) which this line of code https://github.com/huggingface/diffusers/blob/v0.20.0-release/examples/text_to_image/train_text_to_image.py#L635 was causing to overwrite the actual trained model immediately after saving.
What happens if you check the md5sum of the unet checkpoints? I'm running into a similar issue - I adapted the training script to run on a specialized objective and the loss is improving (implying the unet parameters are changing) but all my checkpoints are identical (even the md5sum)
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
I was training sdxl UNET base model which was going great until around step 210k when the weights suddenly turned back to their original values and stayed that way. I also tried with the ema version, which didn’t change at all. I also looked at the tensor’s weight values directly which confirmed my suspicions. Is this a bug or did most likely a mistake on my part? Has anyone experienced something similar with any other SD model?
code :
diffusers/examples/text_to_image/train_text_to_image_sdxl.py
command :example plot (same seed):