zh460045050 / VQGAN-LC

94 stars 6 forks source link

Reproduction details #3

Open ChangyaoTian opened 3 months ago

ChangyaoTian commented 3 months ago

Hi, thanks for your great work!

I would like to know the exact number of GPUs used during the image quantization stage. I see in your paper that you mentioned "utilizing 32 Nvidia V100 GPUs", but the training script in the repo seems to indicate 8 cards (--nproc-per-node 8). So, what is the exact number of GPUs and the corresponding batch size?

zh460045050 commented 3 months ago

We used a total of 4 nodes, each equipped with 8 V100 GPUs. Thus, the total number of GPUs was 32. The batch size was set to 256, resulting in 8 batches per GPU.

ChangyaoTian commented 3 months ago

Got it, thanks!

ChangyaoTian commented 3 months ago

Could you provide all the exact hyperparameters used for reproducing the training process of VQGAN-LC on ImageNet-1k? Cuz some key params are mismatched between the paper and the example script, such as the number of GPUS, batch size, lr and epochs.

image

image

Thanks!

ChangyaoTian commented 3 months ago

Another mismatch found in codebook_generation: the default k value in codebook_generation/minibatch_kmeans_per_class.py is 1000, which means the generated codebook size will be 1M (1k * 1k) rather than 100,000.

Does it need to be changed to 100 instead?

zh460045050 commented 3 months ago

Thank you for reporting these issues. The learning rate is set to 5e-4. We trained the tokenizers for only 20 epochs for comparison purposes, and we think longer training times may contribute to higher performance. Additionally, the k value in "minibatch_kmeans_per_class.py" should be set to 100 to generate a 100K codebook.

ChangyaoTian commented 3 months ago

Hey, I have some more new issues w.r.t the GPT training reproduction: The transformer in GPT has 509.42M trainable params rather than the claimed 404M param in paper, is that correct?

image

ChangyaoTian commented 3 months ago

And for GPT testing, there is no file named eval_generation_imagenet.py, is it should be eval_generation.py instead?

ChangyaoTian commented 3 months ago

As for GPT testing, the evaluation gt dir here should be the val subdir rather than train ?

image

zh460045050 commented 3 months ago

Thank you for reporting these issues.

  1. For the size of GPT, the parameters (404M) are reported for only the transformer layers. Thus, we believe 509.42M is correct when including the token embeddings.

  2. "eval_generation_imagenet.py" should be "eval_generation.py" and we have corrected it in the README.

  3. For GPT testing, we follow existing works to compare the FID between 50,000 generated images and the ImageNet training set. Thus, the evaluation ground truth directory is correct.

ChangyaoTian commented 3 months ago

Thanks for your reply. For GPT testing, as far as I know, nowadays a more common practise is to directly adopt the evaluation script from OpenAI, where the statistics are calculated over the whole dataset. Could you also provide the eval script and the corresponding results? Thanks!