Open prettyprettyboy opened 1 year ago
Hello, thank you for your interest! As you noticed, we trained all of our models with a single GPU and our codebase doesn't currently support distributed training. I can look into adding support for multi-GPU training, but I don't have the bandwidth to work on it immediately.
It's hard for me to comment on the error if you've made local changes within the codebase. If there is a problem with computing perplexity, then it may be useful to manually inspect the text generations before the perplexity.compute()
function call. It seems like the input may be malformed in some way?
Thanks for replying. I thought LD4LG is a post-processing or "plug-and-play" method like Diffusion-LM etc. at first. Post-processing methods usually use a classifier to control the model, however, you use a class embedding to control instead. So LD4LG is more like a prefix-tuning method to learn a smaller vector? Another question is that the number of samples in DDIM is usually less than T, but there are still T times of sampling in the pseudocode.
For our class-conditional language generation models, we train them similarly to class-conditional vision models (e.g. [1] [2]) and explicitly condition the network on a learnable embedding that specifies the class. Our approach should also be compatible with classifier guidance, although we did not explore that in this work.
As you mentioned, DDIM generally improves the generation quality when downsampling the timesteps. For simplictly, we omitted the optional downsampling in the pseudocode.
Hi! I have noticed that you utilize the "accelerate" repo to train the model only on a single GPU. I change it to multi-gpus and use "model.module" to replace the corresponding code. However, there is an error when computing perplexity:
How can I fix it or could you release a multi-GPU version? Thanks!