Closed ourpubliccodes closed 4 months ago
Hi, for VAE, usually training 50-150 epochs give satisfactory checkpoints. You can observe the reconstruction results and the reconstruction loss. For Uni-Modal diffusion models, usually takes 100-200 epochs.
Hello, may I ask if a mask file is also required for training a text to face single diffusion model. I trained the text to face single diffusion model on a new dataset without providing a mask file, and found that the training output only improved the image within the default square area
Hi, if you are referring to the text-to-image model, then no mask is needed.
Hello! Based on the instructions you provided, I am trying to retrain the VAE model and uni-model for text to face on RTX3090, may I ask what is the epoch for training these two models respectively? Or are you judging whether to end the model training process based on the visualization results of reconstructions_gs-xxxxxx_e-xxxxxx_b-xxxxxxx.png and samples_gs-xxxxxx_e-xxxxxx_b-xxxxxxx.png? Looking forward to your answer.