Closed leonardodora closed 4 months ago
And I wonder why train a text-condition diffusion model from scratch instead of finetuning the pretrained stable diffusion model using lora (or something else)?
And I wonder why train a text-condition diffusion model from scratch instead of finetuning the pretrained stable diffusion model using lora (or something else)?
That's because we train a VAE that can better express facial details (e.g., eyes) compared to LDM's, and then train the UNet in our VAE latent space.
And I wonder why train a text-condition diffusion model from scratch instead of finetuning the pretrained stable diffusion model using lora (or something else)?
That's because we train a VAE that can better express facial details (e.g., eyes) compared to LDM's, and then train the UNet in our VAE latent space.
make sense! thanks for your reply
hi, could you please give the list of every model's trainning time? and the number of gpus you used. thanks for your reply~
For all training experiments, we used 4 V100 GPUs. The training time is usually several days, and it varies across models.
hi, could you please give the list of every model's trainning time? and the number of gpus you used. thanks for your reply~
For all training experiments, we used 4 V100 GPUs. The training time is usually several days, and it varies across models.
Does the epoch is set to 1000? I train the mask2image model, and it costs 1day for 50 epoch on 8 3090 RTX. It seems it needs 20 days for finishing the training process.
hi, could you please give the list of every model's trainning time? and the number of gpus you used. thanks for your reply~
For all training experiments, we used 4 V100 GPUs. The training time is usually several days, and it varies across models.
Does the epoch is set to 1000? I train the mask2image model, and it costs 1day for 50 epoch on 8 3090 RTX. It seems it needs 20 days for finishing the training process.
It is stopped at around 100-200 epochs. I found around 100-plus epochs already give relatively stable results, and keep training might only provide marginal improvements. If you have the compute you can train for longer time and see whether the sample quality significantly improves at later stages.
Hi, since there's no follow-up questions for a long time, I'm closing the issue. Feel free to re-open it if you have any further questions. Thanks!
hi, could you please give the list of every model's trainning time? and the number of gpus you used. thanks for your reply~