Training loss unconverged, output unmatched with input

lllyasviel / ControlNet

Let us control diffusion models!

Apache License 2.0

30.14k stars 2.72k forks source link

Training loss unconverged, output unmatched with input #215

Open shanemankiw opened 1 year ago

shanemankiw commented 1 year ago

Hi, thanks for this great work! We try to train the model to inpaint, followed your instructions to generate prompts from BLIP and accumulate grad. Now we use a batchsize of 4 x grad accumulation of 16, but the loss just fluctuated over time and never went down. This is the loss curve:

And as a result, the output does not match the input part either. It has ok quality on its own, though. I was considering,

if I modify the DDPM pipeline, the output result(from noise, sample 50 DDIM steps) could be more aligned with input, do you have any suggestions? Also,
it still does not solve the training loss problem where time steps are randomly sampled.

lllyasviel commented 1 year ago

2.4k steps with 4x accumulation means 0.6k real steps. for sd 1.5 the converge usually happen at 4k to 6k real steps. in your case need to wait to 24k steps

shanemankiw commented 1 year ago

2.4k steps with 4x accumulation means 0.6k real steps. for sd 1.5 the converge usually happen at 4k to 6k real steps. in your case need to wait to 24k steps

Thanks! We will wait a while then.

Jackqu commented 1 year ago

Hi, Do you have good results

shanemankiw commented 1 year ago

Hi, Do you have good results

Not really in this particular case. But I did discover the nature of diffusion model training, especially finetuning: the loss would fluctuate and sometimes would only go down slightly in log scale. However, it does not mean the network did not learn anything. The network might still be finetuned.

Oguzhanercan commented 3 months ago

@shanemankiw Hi, I am trying to finetune InstantID, which uses controlnet. And I get loss function like yours. If you get good results with controlnet can you share techniques and hyperparameters (optimizer related, batchsize, number of samples etc) ?

shanemankiw commented 3 months ago

@shanemankiw Hi, I am trying to finetune InstantID, which uses controlnet. And I get loss function like yours. If you get good results with controlnet can you share techniques and hyperparameters (optimizer related, batchsize, number of samples etc) ?

For me the trick is to let it run a little longer... I did not adjust other hyperparameters in detail, but generally I found that the bigger the batchsize the better the performance

Oguzhanercan commented 3 months ago

@shanemankiw Hi, I am trying to finetune InstantID, which uses controlnet. And I get loss function like yours. If you get good results with controlnet can you share techniques and hyperparameters (optimizer related, batchsize, number of samples etc) ?

For me the trick is to let it run a little longer... I did not adjust other hyperparameters in detail, but generally I found that the bigger the batchsize the better the performance

Thanks,