Closed VIROBO-15 closed 2 years ago
Hi, thanks for your interests!
(a) For FVD and KVD metrics, please check https://github.com/SongweiGe/TATS#synthesis for the scripts. For IS, we generally generated 10K samples and compute it with the code from tgan2 repo (https://github.com/pfnet-research/tgan2) offline.
(b) It depends on the models. For the longest training of the UCF model, it took 10 days on 8 V-100 gpus. Please check our Appendix B.2 for more details.
Thank you for your Reply......
Can you please provide the exact script for computing FVD and KVD metric for Sky Time-lapse dataset.
Currently I am using - python ./scripts/sample_vqgan_transformer_short_videos.py --gpt_ckpt ./checkpoint/uncond_gpt_sky_128_488_23999__version_50615218__epoch=4024-step=329999-train.ckpt --vqgan_ckpt ./checkpoint/vqgan_sky_128_488_epoch=12-step=29999-train.ckpt --save ./TATS/output --data_path ./TATS/data/sky_timelapse/sky_test --batch_size 16 --top_k 2048 --top_p 0.8 --dataset sky --compute_fvd --save_videos
I am getting the error as = FileNotFoundError: [Errno 2] No such file or directory: './TATS/data/sky_timelapse/sky_test/train/metadata_16.pkl'
Can you Please kindly help in this?
It looks like you have successfully generated samples but failed when loading the real videos. You need to specify the data_path
to be ./TATS/data/sky_timelapse/
with two subfolders called train/
and test/
. You also need to add flag --image_folder
when using the sky dataset as it stored the videos by frames. Let me know if you have more questions!
Thank you SongweiGe for helping.........
I have obtained the required FVD and KVD metric for Sky Time-lapse dataset.
But when I am running the same on tai chi dataset I am getting the FVD = 261.24 and KVD = 48.26
Script - python ./scripts/sample_vqgan_transformer_short_videos.py --gpt_ckpt ./TATS/checkpoint/uncond_gpt_taichi_128_488_45999_epoch=12-step=529999-train.ckpt --vqgan_ckpt ./mn/TATS/checkpoint/vqgan_taichi_128_488_epoch=6-step=45999-train.ckpt --save ./mn/TATS/output/taichi --data_path ./mn/TATS/data/taichi --batch_size 16 --top_k 2048 --top_p 0.8 --dataset taichi --compute_fvd --save_videos
Results : saved videos to /proj/cvl/users/x_fahkh/mn/TATS/output/taichi/videos/taichi/topp0.80_topk2048_run0/generation_128.avi saving numpy file to /proj/cvl/users/x_fahkh/mn/TATS/output/taichi/numpy_files/taichi/topp0.80_topk2048_run0_eval.npy... computing fvd embeddings for real videos caoncat fvd embeddings for real videos computing fvd embeddings for fake videos caoncat fvd embeddings for fake videos FVD = 261.24 KVD = 48.26
Can you please help me getting the required number
Nice.
Your Taichi command looks good to me except that you are missing `--sample_every_n_frames 4``. This should only affect the real embeddings. To get a quick sanity check to see if this fixes the problem, you may skip the generation part and only compute the fvd with the generated npy file and new dataloader.
Thank you for your reply.......
As your suggestion I changed the script but still I am not getting the response. Script - python ./scripts/sample_vqgan_transformer_short_videos.py --gpt_ckpt ./TATS/checkpoint/uncond_gpt_taichi_128_488_45999_epoch=12-step=529999-train.ckpt --vqgan_ckpt ./mn/TATS/checkpoint/vqgan_taichi_128_488_epoch=6-step=45999-train.ckpt --save ./mn/TATS/output/taichi --data_path ./mn/TATS/data/taichi --batch_size 16 --top_k 2048 --top_p 0.8 --dataset taichi --compute_fvd --save_videos --sample_every_n_frames 4
Error:
saved videos to /proj/cvl/users/x_fahkh/mn/TATS/output/taichi/videos/taichi/topp0.80_topk2048_run0/generation_128.avi
saving numpy file to /proj/cvl/users/x_fahkh/mn/TATS/output/taichi/numpy_files/taichi/topp0.80_topk2048_run0_eval.npy...
computing fvd embeddings for real videos
Traceback (most recent call last):
File "./scripts/sample_vqgan_transformer_short_videos.py", line 116, in
Can you Please kindly look at this......
Can you try to add --resolution 128 --sequence_length 64
to your command?
Thank you SongweiGe for helping... I have been training the Sky_timelapse dataset and getting some error can you please look at this
COMMAND_TO_RUN="python ./scripts/train_vqgan.py --embedding_dim 256 --n_codes 16384 --n_hiddens 16 --downsample 4 8 8 --gpus 8 --sync_batchnorm --batch_size 2 --num_workers 32 --accumulate_grad_batches 6 --progress_bar_refresh_rate 500 --max_steps 2000000 --gradient_clip_val 1.0 --lr 3e-5 --data_path ./TATS/data/sky_timelapse --default_root_dir ./TATS --resolution 128 --sequence_length 16 --discriminator_iter_start 10000 --norm_type batch --perceptual_weight 4 --image_gan_weight 1 --video_gan_weight 1 --gan_feat_weight 4"
Global seed set to 1234
Error : -
Traceback (most recent call last):
File "./scripts/train_vqgan.py", line 73, in
Can you please help me in this......
I'm more than happy to help. This looks like a dataloader issue. Usually it happens when your cache file has a different path from where your videos are really stored. You may manually examine your cache files under datafolder to debug this, whose file name should be similar to metadata_16.pkl
. You can do import pickle; pickle.load(open('metadata_16.pkl', 'rb'))
to check this.
Thank you SongweiGe for helping me out.............
Can you Please let me know how you have split the UCF-101 dataset or can you please the code for creating the train and test folder in the UCF dataset so that I can get the required number in the table......
Of course, will send you through the email!
Thank you SongweiGe....
Can you Please help me in this.
I trained the vqgan epoch=23-step=194999-train and transformer for epoch=12-step=420000-train but surprisingly I got the FVD = 468.32 KVD = 95.61
script for training vqgan - python ./scripts/train_vqgan.py --embedding_dim 256 --n_codes 16384 --n_hiddens 16 --downsample 4 8 8 --gpus 8 --sync_batchnorm --batch_size 2 --num_workers 32 --accumulate_grad_batches 6 --progress_bar_refresh_rate 500 --max_steps 2000000 --gradient_clip_val 1.0 --lr 3e-5 --data_path ./TATS/data/taichi --default_root_dir ./TATS --resolution 128 --sequence_length 16 --discriminator_iter_start 10000 --norm_type batch --perceptual_weight 4 --image_gan_weight 1 --video_gan_weight 1 --gan_feat_weight 4
Scripting for training transformer - python ./scripts/train_transformer.py --num_workers 32 --val_check_interval 0.5 --progress_bar_refresh_rate 500 --gpus 8 --sync_batchnorm --batch_size 3 --unconditional --vqvae ./TATS/lightning_logs/version_3942275/checkpoints/latest_checkpoint_prev.ckpt --data_path ./TATS/data/taichi --default_root_dir ./mn/TATS --vocab_size 16384 --block_size 1024 --n_layer 24 --n_head 16 --n_embd 1024 --resolution 128 --sequence_length 16 --max_steps 2000000
script for testing - python ./scripts/sample_vqgan_transformer_short_videos.py --gpt_ckpt ./lightning_logs/version_3953207/checkpoints/best_checkpoint.ckpt --vqgan_ckpt ./lightning_logs/version_3942275/checkpoints/latest_checkpoint_prev.ckpt --save ./TATS/output/taichi --data_path ./TATS/data/taichi --batch_size 16 --top_k 2048 --top_p 0.8 --dataset taichi --compute_fvd --save_videos --sample_every_n_frames 4 --resolution 128 --sequence_length 64
Can you Please help me out in this.........
Hi Amandeep, you also need --sample_every_n_frames 4
when training your VQGAN and transformer to make it consistent with inference. One quick sanity check you can do is to remove --sample_every_n_frames 4
in your testing code and see if you can get a reasonable FVD. This configuration is to follow the previous works and make a fair comparison.
After removing the --sample_every_n_frames 4 in the testing code. FVD = 3245.95 KVD = 1767.22
Interesting... I would expect the opposite if you trained without skipped frames. You may visually check some results to see if they are reasonable and compare with our released checkpoints. Just to the best I can help, how long did you train the VQGAN and transformer?
I have trained VQGAN for 60 hours and Transformer around 50 hours
Can you let me know how many iterations did you train? Usually training longer helps the transformer a lot but not VQGAN.
I have trained transformer for epoch=13-step=450000-train.
Yeah, that should be long enough for giving a good FVD. I have a few thoughts on how to debug this:
I tried to test whether VQGAN is working perfect or not, So I used pre trained transformer given, but I am getting some error....
script - python ./scripts/sample_vqgan_transformer_short_videos.py --gpt_ckpt ./TATS/checkpoint/uncond_gpt_taichi_128_488_45999_epoch=12-step=529999-train.ckpt --vqgan_ckpt ./TATS/lightning_logs/version_3942275/checkpoints/latest_checkpoint_prev.ckpt --save ./mn/TATS/output/taichi --data_path ./mn/TATS/data/taichi --batch_size 16 --top_k 2048 --top_p 0.8 --dataset taichi --compute_fvd --save_videos --sample_every_n_frames 4 --resolution 128 --sequence_length 64
Error : RuntimeError: Error(s) in loading state_dict for Net2NetTransformer: size mismatch for first_stage_model.encoder.conv_blocks.0.down.conv.weight: copying a param with shape torch.Size([64, 32, 4, 4, 4]) from checkpoint, the shape in current model is torch.Size([32, 16, 4, 4, 4]). size mismatch for first_stage_model.encoder.conv_blocks.0.down.conv.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.encoder.conv_blocks.0.res.norm1.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.encoder.conv_blocks.0.res.norm1.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.encoder.conv_blocks.0.res.norm1.running_mean: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.encoder.conv_blocks.0.res.norm1.running_var: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.encoder.conv_blocks.0.res.conv1.conv.weight: copying a param with shape torch.Size([64, 64, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3, 3]). size mismatch for first_stage_model.encoder.conv_blocks.0.res.conv1.conv.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.encoder.conv_blocks.0.res.norm2.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.encoder.conv_blocks.0.res.norm2.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.encoder.conv_blocks.0.res.norm2.running_mean: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.encoder.conv_blocks.0.res.norm2.running_var: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.encoder.conv_blocks.0.res.conv2.conv.weight: copying a param with shape torch.Size([64, 64, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3, 3]). size mismatch for first_stage_model.encoder.conv_blocks.0.res.conv2.conv.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.encoder.conv_blocks.1.down.conv.weight: copying a param with shape torch.Size([128, 64, 4, 4, 4]) from checkpoint, the shape in current model is torch.Size([64, 32, 4, 4, 4]). size mismatch for first_stage_model.encoder.conv_blocks.1.down.conv.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.encoder.conv_blocks.1.res.norm1.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.encoder.conv_blocks.1.res.norm1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.encoder.conv_blocks.1.res.norm1.running_mean: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.encoder.conv_blocks.1.res.norm1.running_var: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.encoder.conv_blocks.1.res.conv1.conv.weight: copying a param with shape torch.Size([128, 128, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3, 3]). size mismatch for first_stage_model.encoder.conv_blocks.1.res.conv1.conv.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.encoder.conv_blocks.1.res.norm2.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.encoder.conv_blocks.1.res.norm2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.encoder.conv_blocks.1.res.norm2.running_mean: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.encoder.conv_blocks.1.res.norm2.running_var: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.encoder.conv_blocks.1.res.conv2.conv.weight: copying a param with shape torch.Size([128, 128, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3, 3]). size mismatch for first_stage_model.encoder.conv_blocks.1.res.conv2.conv.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.encoder.conv_blocks.2.down.conv.weight: copying a param with shape torch.Size([256, 128, 4, 4, 4]) from checkpoint, the shape in current model is torch.Size([128, 64, 4, 4, 4]). size mismatch for first_stage_model.encoder.conv_blocks.2.down.conv.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.encoder.conv_blocks.2.res.norm1.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.encoder.conv_blocks.2.res.norm1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.encoder.conv_blocks.2.res.norm1.running_mean: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.encoder.conv_blocks.2.res.norm1.running_var: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.encoder.conv_blocks.2.res.conv1.conv.weight: copying a param with shape torch.Size([256, 256, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3, 3]). size mismatch for first_stage_model.encoder.conv_blocks.2.res.conv1.conv.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.encoder.conv_blocks.2.res.norm2.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.encoder.conv_blocks.2.res.norm2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.encoder.conv_blocks.2.res.norm2.running_mean: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.encoder.conv_blocks.2.res.norm2.running_var: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.encoder.conv_blocks.2.res.conv2.conv.weight: copying a param with shape torch.Size([256, 256, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3, 3]). size mismatch for first_stage_model.encoder.conv_blocks.2.res.conv2.conv.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.encoder.conv_first.conv.weight: copying a param with shape torch.Size([32, 3, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([16, 3, 3, 3, 3]). size mismatch for first_stage_model.encoder.conv_first.conv.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([16]). size mismatch for first_stage_model.encoder.final_block.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.encoder.final_block.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.encoder.final_block.0.running_mean: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.encoder.final_block.0.running_var: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.final_block.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.final_block.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.final_block.0.running_mean: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.final_block.0.running_var: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.up.convt.weight: copying a param with shape torch.Size([256, 256, 4, 4, 4]) from checkpoint, the shape in current model is torch.Size([128, 128, 4, 4, 4]). size mismatch for first_stage_model.decoder.conv_blocks.0.up.convt.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.res1.norm1.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.res1.norm1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.res1.norm1.running_mean: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.res1.norm1.running_var: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.res1.conv1.conv.weight: copying a param with shape torch.Size([256, 256, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3, 3]). size mismatch for first_stage_model.decoder.conv_blocks.0.res1.conv1.conv.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.res1.norm2.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.res1.norm2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.res1.norm2.running_mean: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.res1.norm2.running_var: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.res1.conv2.conv.weight: copying a param with shape torch.Size([256, 256, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3, 3]). size mismatch for first_stage_model.decoder.conv_blocks.0.res1.conv2.conv.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.res2.norm1.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.res2.norm1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.res2.norm1.running_mean: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.res2.norm1.running_var: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.res2.conv1.conv.weight: copying a param with shape torch.Size([256, 256, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3, 3]). size mismatch for first_stage_model.decoder.conv_blocks.0.res2.conv1.conv.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.res2.norm2.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.res2.norm2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.res2.norm2.running_mean: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.res2.norm2.running_var: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.res2.conv2.conv.weight: copying a param with shape torch.Size([256, 256, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3, 3]). size mismatch for first_stage_model.decoder.conv_blocks.0.res2.conv2.conv.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.1.up.convt.weight: copying a param with shape torch.Size([256, 128, 4, 4, 4]) from checkpoint, the shape in current model is torch.Size([128, 64, 4, 4, 4]). size mismatch for first_stage_model.decoder.conv_blocks.1.up.convt.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.1.res1.norm1.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.1.res1.norm1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.1.res1.norm1.running_mean: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.1.res1.norm1.running_var: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.1.res1.conv1.conv.weight: copying a param with shape torch.Size([128, 128, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3, 3]). size mismatch for first_stage_model.decoder.conv_blocks.1.res1.conv1.conv.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.1.res1.norm2.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.1.res1.norm2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.1.res1.norm2.running_mean: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.1.res1.norm2.running_var: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.1.res1.conv2.conv.weight: copying a param with shape torch.Size([128, 128, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3, 3]). size mismatch for first_stage_model.decoder.conv_blocks.1.res1.conv2.conv.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.1.res2.norm1.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.1.res2.norm1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.1.res2.norm1.running_mean: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.1.res2.norm1.running_var: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.1.res2.conv1.conv.weight: copying a param with shape torch.Size([128, 128, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3, 3]). size mismatch for first_stage_model.decoder.conv_blocks.1.res2.conv1.conv.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.1.res2.norm2.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.1.res2.norm2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.1.res2.norm2.running_mean: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.1.res2.norm2.running_var: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.1.res2.conv2.conv.weight: copying a param with shape torch.Size([128, 128, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3, 3]). size mismatch for first_stage_model.decoder.conv_blocks.1.res2.conv2.conv.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.2.up.convt.weight: copying a param with shape torch.Size([128, 64, 4, 4, 4]) from checkpoint, the shape in current model is torch.Size([64, 32, 4, 4, 4]). size mismatch for first_stage_model.decoder.conv_blocks.2.up.convt.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_blocks.2.res1.norm1.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_blocks.2.res1.norm1.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_blocks.2.res1.norm1.running_mean: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_blocks.2.res1.norm1.running_var: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_blocks.2.res1.conv1.conv.weight: copying a param with shape torch.Size([64, 64, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3, 3]). size mismatch for first_stage_model.decoder.conv_blocks.2.res1.conv1.conv.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_blocks.2.res1.norm2.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_blocks.2.res1.norm2.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_blocks.2.res1.norm2.running_mean: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_blocks.2.res1.norm2.running_var: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_blocks.2.res1.conv2.conv.weight: copying a param with shape torch.Size([64, 64, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3, 3]). size mismatch for first_stage_model.decoder.conv_blocks.2.res1.conv2.conv.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_blocks.2.res2.norm1.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_blocks.2.res2.norm1.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_blocks.2.res2.norm1.running_mean: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_blocks.2.res2.norm1.running_var: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_blocks.2.res2.conv1.conv.weight: copying a param with shape torch.Size([64, 64, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3, 3]). size mismatch for first_stage_model.decoder.conv_blocks.2.res2.conv1.conv.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_blocks.2.res2.norm2.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_blocks.2.res2.norm2.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_blocks.2.res2.norm2.running_mean: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_blocks.2.res2.norm2.running_var: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_blocks.2.res2.conv2.conv.weight: copying a param with shape torch.Size([64, 64, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3, 3]). size mismatch for first_stage_model.decoder.conv_blocks.2.res2.conv2.conv.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_last.conv.weight: copying a param with shape torch.Size([3, 64, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([3, 32, 3, 3, 3]). size mismatch for first_stage_model.pre_vq_conv.conv.weight: copying a param with shape torch.Size([128, 256, 1, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 128, 1, 1, 1]). size mismatch for first_stage_model.pre_vq_conv.conv.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for first_stage_model.post_vq_conv.conv.weight: copying a param with shape torch.Size([256, 128, 1, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 256, 1, 1, 1]). size mismatch for first_stage_model.post_vq_conv.conv.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.codebook.embeddings: copying a param with shape torch.Size([16384, 128]) from checkpoint, the shape in current model is torch.Size([16384, 256]). size mismatch for first_stage_model.codebook.z_avg: copying a param with shape torch.Size([16384, 128]) from checkpoint, the shape in current model is torch.Size([16384, 256]).
Can you Please tell me the which hyperparameter is creating the problem...
Oh... this is a good catch. In our experiments, we actually used n_hiddens = 32
for our experiments. We updated the code for release with halved n_hiddens by default but forgot to make corresponding change in the scripts. This might explain the gap you saw in the FVD. Thanks for catching that!
In terms of the experiment, I think using a trained transformer with the other independently trained VQGAN won't work since the learned latent spaces are different. I was suggesting training another transformer with our VQGAN checkpoint for debugging previously, which might confuse you.
In terms of the sampling, you might have already done it but I think when you removed --sample_every_n_frames 4
, you also need to specify --sequence_length 16
.
Are you trying to say to set --sequence_length 64 during training the VQGAN and transformer ?
During testing I have set --sequence_length 64 after removing --sample_every_n_frames 4
Oh sorry, I meant during the testing, after removing --sample_every_n_frames 4
, you need to set --sequence_length 16
.
After removing the ----sample_every_n_frames 4 and setting --sequence length 16 FVD = 296.21 KVD = 69.80 Before removing the ----sample_every_n_frames 4 and setting --sequence length 64 FVD = 3245.95 KVD = 1767.22
Current I am using the script - python ./scripts/sample_vqgan_transformer_short_videos.py --gpt_ckpt ./TATS/lightning_logs/version_3953207/checkpoints/best_checkpoint.ckpt --vqgan_ckpt ./TATS/lightning_logs/version_3942275/checkpoints/latest_checkpoint_prev.ckpt --save ./TATS/output/taichi --data_path ./TATS/data/taichi --batch_size 16 --top_k 2048 --top_p 0.8 --dataset taichi --compute_fvd --save_videos --resolution 128 --sequence_length 16
How can we get the required number as in the given table after this process?
Current I am using the script - python ./scripts/sample_vqgan_transformer_short_videos.py --gpt_ckpt ./TATS/lightning_logs/version_3953207/checkpoints/best_checkpoint.ckpt --vqgan_ckpt ./TATS/lightning_logs/version_3942275/checkpoints/latest_checkpoint_prev.ckpt --save ./TATS/output/taichi --data_path ./TATS/data/taichi --batch_size 16 --top_k 2048 --top_p 0.8 --dataset taichi --compute_fvd --save_videos --resolution 128 --sequence_length 16
training script for vqgan - python ./scripts/train_vqgan.py --embedding_dim 256 --n_codes 16384 --n_hiddens 16 --downsample 4 8 8 --gpus 8 --sync_batchnorm --batch_size 2 --num_workers 32 --accumulate_grad_batches 6 --progress_bar_refresh_rate 500 --max_steps 2000000 --gradient_clip_val 1.0 --lr 3e-5 --data_path ./TATS/data/taichi --default_root_dir ./TATS --resolution 128 --sequence_length 16 --discriminator_iter_start 10000 --norm_type batch --perceptual_weight 4 --image_gan_weight 1 --video_gan_weight 1 --gan_feat_weight 4
training script for transformer:- python ./scripts/train_transformer.py --num_workers 32 --val_check_interval 0.5 --progress_bar_refresh_rate 500 --gpus 8 --sync_batchnorm --batch_size 3 --unconditional --vqvae ./TATS/lightning_logs/version_3942275/checkpoints/latest_checkpoint_prev.ckpt --data_path ./TATS/data/taichi --default_root_dir /proj/cvl/users/x_fahkh/mn/TATS --vocab_size 16384 --block_size 1024 --n_layer 24 --n_head 16 --n_embd 1024 --resolution 128 --sequence_length 16 --max_steps 2000000
Can you try to use the correct hyperparameters n_hiddens=32
, sequence_length=64
, and sample_every_n_frames=4
to train the VQGAN and transformer models on Taichi dataset. Maybe train VQGAN for 30k. Let me know if they give you the correct FVDs.
Thank You SongWeiGe for your Response....
As you suggested I have made used all the correct hyper parameter but I am unable to get the required number.
I have analyzed and evaluated the working of VQGAN, from my analysis VQGAN is performing good but the problem is with transformer.
Training VQGAN - python ./scripts/train_vqgan.py --embedding_dim 128 --n_codes 16384 --n_hiddens 32 --downsample 4 8 8 --no_random_restart --gpus 8 --sync_batchnorm --batch_size 2 --num_workers 32 --accumulate_grad_batches 6 --progress_bar_refresh_rate 500 --max_steps 2000000 --gradient_clip_val 1.0 --lr 3e-5 --data_path ./TATS/data/taichi --default_root_dir ./TAT --resolution 128 --sequence_length 64 --discriminator_iter_start 10000 --norm_type batch --perceptual_weight 4 --image_gan_weight 1 --video_gan_weight 1 --gan_feat_weight 4 --sample_every_n_frames 4
Training Transformer - python ./scripts/train_transformer.py --num_workers 32 --val_check_interval 0.5 --progress_bar_refresh_rate 500 --gpus 8 --sync_batchnorm --batch_size 3 --unconditional --vqvae ./TAT/lightning_logs/version_3954144/checkpoints/latest_checkpoint.ckpt --data_path ./TATS/data/taichi --default_root_dir ./TAT --vocab_size 16384 --block_size 1024 --n_layer 24 --n_head 16 --n_embd 1024 --resolution 128 --sequence_length 64 --max_steps 2000000 --sample_every_n_frames 4
Testing Code - python ./scripts/sample_vqgan_transformer_short_videos.py --gpt_ckpt . ./TAT/lightning_logs/version_3954521/checkpoints/best_checkpoint.ckpt --vqgan_ckpt ./TAT/lightning_logs/version_3954144/checkpoints/latest_checkpoint.ckpt --save./TAT/output/taichi --data_path ./TATS/data/taichi --batch_size 16 --top_k 2048 --top_p 0.8 --dataset taichi --compute_fvd --save_videos --sample_every_n_frames 4 --resolution 128 --sequence_length 64
Results - FVD = 153.40 KVD = 19.57 Can You Please have the look......
The training scripts look good to me. I think you might want to do a sweep on the top_k and top_p hyperparameters in your testing script. For instance, you can try top_k=16384, top_p=0.92. If this affects your FVD, you may sweep more - say top_k=1024, 2048, 4096, 8192, and top_p=0.75, 0.8, 0.92, 0.98. Please do some visual check to see if there is obvious diversity and quality issues. Usually these hyperparameters could affect the quality a lot.
Could you let me know how long did you train your VQGAN and transformer? You could also consider train transformer for a bit longer and monitor the FVD changes.
I have trained the transformer for - epoch=13-step=370000-train
I'm closing this issue for now. Please feel free to reopen this if you have more questions.
Few Queries.....
(a)Can you Please provide the evaluation code for reproducing Table 1(a), 1(b), 1(c) and 1(d). (b)Can you Please let me know the total computation hours needed to train the full model.