Open shanemankiw opened 1 year ago
2.4k steps with 4x accumulation means 0.6k real steps. for sd 1.5 the converge usually happen at 4k to 6k real steps. in your case need to wait to 24k steps
2.4k steps with 4x accumulation means 0.6k real steps. for sd 1.5 the converge usually happen at 4k to 6k real steps. in your case need to wait to 24k steps
Thanks! We will wait a while then.
Hi, Do you have good results
Hi, Do you have good results
Not really in this particular case. But I did discover the nature of diffusion model training, especially finetuning: the loss would fluctuate and sometimes would only go down slightly in log scale. However, it does not mean the network did not learn anything. The network might still be finetuned.
@shanemankiw Hi, I am trying to finetune InstantID, which uses controlnet. And I get loss function like yours. If you get good results with controlnet can you share techniques and hyperparameters (optimizer related, batchsize, number of samples etc) ?
@shanemankiw Hi, I am trying to finetune InstantID, which uses controlnet. And I get loss function like yours. If you get good results with controlnet can you share techniques and hyperparameters (optimizer related, batchsize, number of samples etc) ?
For me the trick is to let it run a little longer... I did not adjust other hyperparameters in detail, but generally I found that the bigger the batchsize the better the performance
@shanemankiw Hi, I am trying to finetune InstantID, which uses controlnet. And I get loss function like yours. If you get good results with controlnet can you share techniques and hyperparameters (optimizer related, batchsize, number of samples etc) ?
For me the trick is to let it run a little longer... I did not adjust other hyperparameters in detail, but generally I found that the bigger the batchsize the better the performance
Thanks,
Hi, thanks for this great work! We try to train the model to inpaint, followed your instructions to generate prompts from BLIP and accumulate grad. Now we use a batchsize of 4 x grad accumulation of 16, but the loss just fluctuated over time and never went down. This is the loss curve:
And as a result, the output does not match the input part either. It has ok quality on its own, though. I was considering,