Open ChangyaoTian opened 3 months ago
We used a total of 4 nodes, each equipped with 8 V100 GPUs. Thus, the total number of GPUs was 32. The batch size was set to 256, resulting in 8 batches per GPU.
Got it, thanks!
Could you provide all the exact hyperparameters used for reproducing the training process of VQGAN-LC on ImageNet-1k? Cuz some key params are mismatched between the paper and the example script, such as the number of GPUS, batch size, lr and epochs.
Thanks!
Another mismatch found in codebook_generation: the default k
value in codebook_generation/minibatch_kmeans_per_class.py
is 1000, which means the generated codebook size will be 1M (1k * 1k) rather than 100,000.
Does it need to be changed to 100 instead?
Thank you for reporting these issues. The learning rate is set to 5e-4. We trained the tokenizers for only 20 epochs for comparison purposes, and we think longer training times may contribute to higher performance. Additionally, the k value in "minibatch_kmeans_per_class.py" should be set to 100 to generate a 100K codebook.
Hey, I have some more new issues w.r.t the GPT training reproduction:
The transformer in GPT has 509.42M
trainable params rather than the claimed 404M
param in paper, is that correct?
And for GPT testing, there is no file named eval_generation_imagenet.py
, is it should be eval_generation.py
instead?
As for GPT testing, the evaluation gt dir here should be the val
subdir rather than train
?
Thank you for reporting these issues.
For the size of GPT, the parameters (404M) are reported for only the transformer layers. Thus, we believe 509.42M is correct when including the token embeddings.
"eval_generation_imagenet.py" should be "eval_generation.py" and we have corrected it in the README.
For GPT testing, we follow existing works to compare the FID between 50,000 generated images and the ImageNet training set. Thus, the evaluation ground truth directory is correct.
Thanks for your reply. For GPT testing, as far as I know, nowadays a more common practise is to directly adopt the evaluation script from OpenAI, where the statistics are calculated over the whole dataset. Could you also provide the eval script and the corresponding results? Thanks!
Hi, thanks for your great work!
I would like to know the exact number of GPUs used during the image quantization stage. I see in your paper that you mentioned "utilizing 32 Nvidia V100 GPUs", but the training script in the repo seems to indicate 8 cards (
--nproc-per-node 8
). So, what is the exact number of GPUs and the corresponding batch size?