Closed xiaotingxuan closed 1 year ago
Hi, I have a question. How long did it take you to train the model?
About 70 minutes for one epoch,use one NVIDIA TITAN RTX
Thank you!
How do you process the dataset?I don't know why my loss value can't go down, Looks so weird
I want to train Diffusion Prior model , how to get the PriorEmbeddingDataset? Thank you.
If you mean get clip text embedding and clip image embedding for train prior,I think you can read ClipCap's code for reference. Extract CLIP features, then create your PriorEmbeddingDataset
Hi,In my project, I only train the diffusion prior network.
I use train_prior_config.example.json which is provided in here I only change the hyperparameter "use_ema" to false. And I use MScoco dataset(karpathy split) to train . During training, the loss type is "mse loss", and the train_Ioss can go down to 0.2.
For inference, I use diffusion_prior.sample(tokenized_text, n_samples_per_batch=2, cond_scale=1.0) to generate CLIP image embedding
I think the generated CLIP image embedding should be similar with the ground-truth CLIP image embedding. I find the the cosine similarity between them are about 0.7. Is this normal?