Open Amo5 opened 2 months ago
Maybe it is somehow related to the checkpoint steps.
try to set --checkpointing_steps=1100
(do not save checkpoints during training)
Same problem here even without checkpointing 🫤 I tried to decrease the number of training steps to stop before NaNs appear but the outputs I get are still completely inconsistent.
Hi, @clementjambon @Amo5 , @FerryHuang. Could you please share some details about your setup, like your OS, Python version, installed dependencies, and the exact command you used to run the code? This will help me figure out what's going wrong.
In the meantime, while searching for a solution to this problem, I found some discussions about gradients becoming NaN and Dreambooth. You can check out this link: link. Let me know if this discussion was helpful.
Hi, @clementjambon @Amo5 , @FerryHuang. Could you please share some details about your setup, like your OS, Python version, installed dependencies, and the exact command you used to run the code? This will help me figure out what's going wrong.
Linux, python 3.10 and the dependencies are compatible with the requirements version exactly. However, I've got stable training loss and inference outputs with the learning rate 5e-6. The results are close to what the paper has showcased
hi,all ,i have a question about Sec 4.4 "All LoRA training was performed on a single image." Can I train a lora with multi images with a specific style , because i find only trained with a single image will loss some style details. Thank you!
The same problem