yardenfren1996 / B-LoRA

Implicit Style-Content Separation using B-LoRA
MIT License
285 stars 18 forks source link

Why does the loss become NaN after training for more than 500 steps? #16

Open Amo5 opened 2 months ago

FerryHuang commented 2 months ago

The same problem

yardenfren1996 commented 2 months ago

Maybe it is somehow related to the checkpoint steps. try to set --checkpointing_steps=1100 (do not save checkpoints during training)

clementjambon commented 2 months ago

Same problem here even without checkpointing 🫤 I tried to decrease the number of training steps to stop before NaNs appear but the outputs I get are still completely inconsistent.

yardenfren1996 commented 1 month ago

Hi, @clementjambon @Amo5 , @FerryHuang. Could you please share some details about your setup, like your OS, Python version, installed dependencies, and the exact command you used to run the code? This will help me figure out what's going wrong.

yardenfren1996 commented 1 month ago

In the meantime, while searching for a solution to this problem, I found some discussions about gradients becoming NaN and Dreambooth. You can check out this link: link. Let me know if this discussion was helpful.

FerryHuang commented 1 month ago

Hi, @clementjambon @Amo5 , @FerryHuang. Could you please share some details about your setup, like your OS, Python version, installed dependencies, and the exact command you used to run the code? This will help me figure out what's going wrong.

Linux, python 3.10 and the dependencies are compatible with the requirements version exactly. However, I've got stable training loss and inference outputs with the learning rate 5e-6. The results are close to what the paper has showcased

t00350320 commented 1 week ago

hi,all ,i have a question about Sec 4.4 "All LoRA training was performed on a single image." Can I train a lora with multi images with a specific style , because i find only trained with a single image will loss some style details. Thank you!