songweige / TATS

Official PyTorch implementation of TATS: A Long Video Generation Framework with Time-Agnostic VQGAN and Time-Sensitive Transformer (ECCV 2022)
MIT License
267 stars 17 forks source link

Great Work!!!!! #5

Closed VIROBO-15 closed 2 years ago

VIROBO-15 commented 2 years ago

Few Queries.....

(a)Can you Please provide the evaluation code for reproducing Table 1(a), 1(b), 1(c) and 1(d). (b)Can you Please let me know the total computation hours needed to train the full model.

songweige commented 2 years ago

Hi, thanks for your interests!

(a) For FVD and KVD metrics, please check https://github.com/SongweiGe/TATS#synthesis for the scripts. For IS, we generally generated 10K samples and compute it with the code from tgan2 repo (https://github.com/pfnet-research/tgan2) offline.

(b) It depends on the models. For the longest training of the UCF model, it took 10 days on 8 V-100 gpus. Please check our Appendix B.2 for more details.

VIROBO-15 commented 2 years ago

Thank you for your Reply......

Can you please provide the exact script for computing FVD and KVD metric for Sky Time-lapse dataset.

Currently I am using - python ./scripts/sample_vqgan_transformer_short_videos.py --gpt_ckpt ./checkpoint/uncond_gpt_sky_128_488_23999__version_50615218__epoch=4024-step=329999-train.ckpt --vqgan_ckpt ./checkpoint/vqgan_sky_128_488_epoch=12-step=29999-train.ckpt --save ./TATS/output --data_path ./TATS/data/sky_timelapse/sky_test --batch_size 16 --top_k 2048 --top_p 0.8 --dataset sky --compute_fvd --save_videos

I am getting the error as = FileNotFoundError: [Errno 2] No such file or directory: './TATS/data/sky_timelapse/sky_test/train/metadata_16.pkl'

Can you Please kindly help in this?

songweige commented 2 years ago

It looks like you have successfully generated samples but failed when loading the real videos. You need to specify the data_path to be ./TATS/data/sky_timelapse/ with two subfolders called train/ and test/. You also need to add flag --image_folder when using the sky dataset as it stored the videos by frames. Let me know if you have more questions!

VIROBO-15 commented 2 years ago

Thank you SongweiGe for helping.........

I have obtained the required FVD and KVD metric for Sky Time-lapse dataset.

But when I am running the same on tai chi dataset I am getting the FVD = 261.24 and KVD = 48.26

Script - python ./scripts/sample_vqgan_transformer_short_videos.py --gpt_ckpt ./TATS/checkpoint/uncond_gpt_taichi_128_488_45999_epoch=12-step=529999-train.ckpt --vqgan_ckpt ./mn/TATS/checkpoint/vqgan_taichi_128_488_epoch=6-step=45999-train.ckpt --save ./mn/TATS/output/taichi --data_path ./mn/TATS/data/taichi --batch_size 16 --top_k 2048 --top_p 0.8 --dataset taichi --compute_fvd --save_videos

Results : saved videos to /proj/cvl/users/x_fahkh/mn/TATS/output/taichi/videos/taichi/topp0.80_topk2048_run0/generation_128.avi saving numpy file to /proj/cvl/users/x_fahkh/mn/TATS/output/taichi/numpy_files/taichi/topp0.80_topk2048_run0_eval.npy... computing fvd embeddings for real videos caoncat fvd embeddings for real videos computing fvd embeddings for fake videos caoncat fvd embeddings for fake videos FVD = 261.24 KVD = 48.26

Can you please help me getting the required number

songweige commented 2 years ago

Nice.

Your Taichi command looks good to me except that you are missing `--sample_every_n_frames 4``. This should only affect the real embeddings. To get a quick sanity check to see if this fixes the problem, you may skip the generation part and only compute the fvd with the generated npy file and new dataloader.

VIROBO-15 commented 2 years ago

Thank you for your reply.......

As your suggestion I changed the script but still I am not getting the response. Script - python ./scripts/sample_vqgan_transformer_short_videos.py --gpt_ckpt ./TATS/checkpoint/uncond_gpt_taichi_128_488_45999_epoch=12-step=529999-train.ckpt --vqgan_ckpt ./mn/TATS/checkpoint/vqgan_taichi_128_488_epoch=6-step=45999-train.ckpt --save ./mn/TATS/output/taichi --data_path ./mn/TATS/data/taichi --batch_size 16 --top_k 2048 --top_p 0.8 --dataset taichi --compute_fvd --save_videos --sample_every_n_frames 4

Error: saved videos to /proj/cvl/users/x_fahkh/mn/TATS/output/taichi/videos/taichi/topp0.80_topk2048_run0/generation_128.avi saving numpy file to /proj/cvl/users/x_fahkh/mn/TATS/output/taichi/numpy_files/taichi/topp0.80_topk2048_run0_eval.npy... computing fvd embeddings for real videos Traceback (most recent call last): File "./scripts/sample_vqgan_transformer_short_videos.py", line 116, in real_embeddings.append(get_fvd_logits(shift_dim((batch['video']+0.5)255, 1, -1).byte().data.numpy(), i3d=i3d, device=device)) File "/proj/cvl/users/x_fahkh/mn/TATS/tats/fvd/fvd.py", line 31, in get_fvd_logits embeddings = get_logits(i3d, videos, device) File "/proj/cvl/users/x_fahkh/mn/TATS/tats/fvd/fvd.py", line 125, in get_logits logits.append(i3d(batch)) File "/home/x_fahkh/.conda/envs/tats/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, *kwargs) File "/proj/cvl/users/x_fahkh/mn/TATS/tats/fvd/pytorch_i3d.py", line 343, in forward x = self.logits(self.dropout(self.avg_pool(x))) File "/home/x_fahkh/.conda/envs/tats/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, **kwargs) File "/home/x_fahkh/.conda/envs/tats/lib/python3.8/site-packages/torch/nn/modules/pooling.py", line 708, in forward return F.avg_pool3d(input, self.kernel_size, self.stride, RuntimeError: input image (T: 1 H: 7 W: 7) smaller than kernel size (kT: 2 kH: 7 kW: 7) srun: error: node026: task 0: Exited with exit code 1

Can you Please kindly look at this......

songweige commented 2 years ago

Can you try to add --resolution 128 --sequence_length 64 to your command?

VIROBO-15 commented 2 years ago

Thank you SongweiGe for helping... I have been training the Sky_timelapse dataset and getting some error can you please look at this

COMMAND_TO_RUN="python ./scripts/train_vqgan.py --embedding_dim 256 --n_codes 16384 --n_hiddens 16 --downsample 4 8 8 --gpus 8 --sync_batchnorm --batch_size 2 --num_workers 32 --accumulate_grad_batches 6 --progress_bar_refresh_rate 500 --max_steps 2000000 --gradient_clip_val 1.0 --lr 3e-5 --data_path ./TATS/data/sky_timelapse --default_root_dir ./TATS --resolution 128 --sequence_length 16 --discriminator_iter_start 10000 --norm_type batch --perceptual_weight 4 --image_gan_weight 1 --video_gan_weight 1 --gan_feat_weight 4"

Global seed set to 1234 Error : - Traceback (most recent call last): File "./scripts/train_vqgan.py", line 73, in main() File "./scripts/train_vqgan.py", line 24, in main data.train_dataloader() File "/proj/cvl/users/x_fahkh/mn/TATS/tats/data.py", line 296, in train_dataloader return self._dataloader(True) File "/proj/cvl/users/x_fahkh/mn/TATS/tats/data.py", line 285, in _dataloader dataloader = data.DataLoader( File "/home/x_fahkh/.conda/envs/tats/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 347, in init sampler = RandomSampler(dataset, generator=generator) # type: ignore[arg-type] File "/home/x_fahkh/.conda/envs/tats/lib/python3.8/site-packages/torch/utils/data/sampler.py", line 106, in init if not isinstance(self.num_samples, int) or self.num_samples <= 0: File "/home/x_fahkh/.conda/envs/tats/lib/python3.8/site-packages/torch/utils/data/sampler.py", line 114, in num_samples return len(self.data_source) File "/proj/cvl/users/x_fahkh/mn/TATS/tats/data.py", line 74, in len return self._clips.num_clips() File "/home/x_fahkh/.conda/envs/tats/lib/python3.8/site-packages/torchvision/datasets/video_utils.py", line 267, in num_clips return self.cumulative_sizes[-1] IndexError: list index out of range

Can you please help me in this......

songweige commented 2 years ago

I'm more than happy to help. This looks like a dataloader issue. Usually it happens when your cache file has a different path from where your videos are really stored. You may manually examine your cache files under datafolder to debug this, whose file name should be similar to metadata_16.pkl. You can do import pickle; pickle.load(open('metadata_16.pkl', 'rb')) to check this.

VIROBO-15 commented 2 years ago

Thank you SongweiGe for helping me out.............

Can you Please let me know how you have split the UCF-101 dataset or can you please the code for creating the train and test folder in the UCF dataset so that I can get the required number in the table......

songweige commented 2 years ago

Of course, will send you through the email!

VIROBO-15 commented 2 years ago

Thank you SongweiGe....

Can you Please help me in this.

I trained the vqgan epoch=23-step=194999-train and transformer for epoch=12-step=420000-train but surprisingly I got the FVD = 468.32 KVD = 95.61

script for training vqgan - python ./scripts/train_vqgan.py --embedding_dim 256 --n_codes 16384 --n_hiddens 16 --downsample 4 8 8 --gpus 8 --sync_batchnorm --batch_size 2 --num_workers 32 --accumulate_grad_batches 6 --progress_bar_refresh_rate 500 --max_steps 2000000 --gradient_clip_val 1.0 --lr 3e-5 --data_path ./TATS/data/taichi --default_root_dir ./TATS --resolution 128 --sequence_length 16 --discriminator_iter_start 10000 --norm_type batch --perceptual_weight 4 --image_gan_weight 1 --video_gan_weight 1 --gan_feat_weight 4

Scripting for training transformer - python ./scripts/train_transformer.py --num_workers 32 --val_check_interval 0.5 --progress_bar_refresh_rate 500 --gpus 8 --sync_batchnorm --batch_size 3 --unconditional --vqvae ./TATS/lightning_logs/version_3942275/checkpoints/latest_checkpoint_prev.ckpt --data_path ./TATS/data/taichi --default_root_dir ./mn/TATS --vocab_size 16384 --block_size 1024 --n_layer 24 --n_head 16 --n_embd 1024 --resolution 128 --sequence_length 16 --max_steps 2000000

script for testing - python ./scripts/sample_vqgan_transformer_short_videos.py --gpt_ckpt ./lightning_logs/version_3953207/checkpoints/best_checkpoint.ckpt --vqgan_ckpt ./lightning_logs/version_3942275/checkpoints/latest_checkpoint_prev.ckpt --save ./TATS/output/taichi --data_path ./TATS/data/taichi --batch_size 16 --top_k 2048 --top_p 0.8 --dataset taichi --compute_fvd --save_videos --sample_every_n_frames 4 --resolution 128 --sequence_length 64

Can you Please help me out in this.........

songweige commented 2 years ago

Hi Amandeep, you also need --sample_every_n_frames 4 when training your VQGAN and transformer to make it consistent with inference. One quick sanity check you can do is to remove --sample_every_n_frames 4 in your testing code and see if you can get a reasonable FVD. This configuration is to follow the previous works and make a fair comparison.

VIROBO-15 commented 2 years ago

After removing the --sample_every_n_frames 4 in the testing code. FVD = 3245.95 KVD = 1767.22

songweige commented 2 years ago

Interesting... I would expect the opposite if you trained without skipped frames. You may visually check some results to see if they are reasonable and compare with our released checkpoints. Just to the best I can help, how long did you train the VQGAN and transformer?

VIROBO-15 commented 2 years ago

I have trained VQGAN for 60 hours and Transformer around 50 hours

songweige commented 2 years ago

Can you let me know how many iterations did you train? Usually training longer helps the transformer a lot but not VQGAN.

VIROBO-15 commented 2 years ago

I have trained transformer for epoch=13-step=450000-train.

songweige commented 2 years ago

Yeah, that should be long enough for giving a good FVD. I have a few thoughts on how to debug this:

  1. Compute the FVD of your reconstruction using VQGAN - that's the upper bound you can achieve with generation. Usually I chose checkpoints between 20k -50k.
  2. Use our released VQGAN checkpoint to debug if the issue is on transformer.
  3. Make sure to use consistent hyperparameters for training dataloader and testing dataloader.
VIROBO-15 commented 2 years ago

I tried to test whether VQGAN is working perfect or not, So I used pre trained transformer given, but I am getting some error....

script - python ./scripts/sample_vqgan_transformer_short_videos.py --gpt_ckpt ./TATS/checkpoint/uncond_gpt_taichi_128_488_45999_epoch=12-step=529999-train.ckpt --vqgan_ckpt ./TATS/lightning_logs/version_3942275/checkpoints/latest_checkpoint_prev.ckpt --save ./mn/TATS/output/taichi --data_path ./mn/TATS/data/taichi --batch_size 16 --top_k 2048 --top_p 0.8 --dataset taichi --compute_fvd --save_videos --sample_every_n_frames 4 --resolution 128 --sequence_length 64

Error : RuntimeError: Error(s) in loading state_dict for Net2NetTransformer: size mismatch for first_stage_model.encoder.conv_blocks.0.down.conv.weight: copying a param with shape torch.Size([64, 32, 4, 4, 4]) from checkpoint, the shape in current model is torch.Size([32, 16, 4, 4, 4]). size mismatch for first_stage_model.encoder.conv_blocks.0.down.conv.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.encoder.conv_blocks.0.res.norm1.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.encoder.conv_blocks.0.res.norm1.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.encoder.conv_blocks.0.res.norm1.running_mean: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.encoder.conv_blocks.0.res.norm1.running_var: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.encoder.conv_blocks.0.res.conv1.conv.weight: copying a param with shape torch.Size([64, 64, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3, 3]). size mismatch for first_stage_model.encoder.conv_blocks.0.res.conv1.conv.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.encoder.conv_blocks.0.res.norm2.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.encoder.conv_blocks.0.res.norm2.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.encoder.conv_blocks.0.res.norm2.running_mean: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.encoder.conv_blocks.0.res.norm2.running_var: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.encoder.conv_blocks.0.res.conv2.conv.weight: copying a param with shape torch.Size([64, 64, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3, 3]). size mismatch for first_stage_model.encoder.conv_blocks.0.res.conv2.conv.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.encoder.conv_blocks.1.down.conv.weight: copying a param with shape torch.Size([128, 64, 4, 4, 4]) from checkpoint, the shape in current model is torch.Size([64, 32, 4, 4, 4]). size mismatch for first_stage_model.encoder.conv_blocks.1.down.conv.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.encoder.conv_blocks.1.res.norm1.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.encoder.conv_blocks.1.res.norm1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.encoder.conv_blocks.1.res.norm1.running_mean: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.encoder.conv_blocks.1.res.norm1.running_var: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.encoder.conv_blocks.1.res.conv1.conv.weight: copying a param with shape torch.Size([128, 128, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3, 3]). size mismatch for first_stage_model.encoder.conv_blocks.1.res.conv1.conv.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.encoder.conv_blocks.1.res.norm2.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.encoder.conv_blocks.1.res.norm2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.encoder.conv_blocks.1.res.norm2.running_mean: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.encoder.conv_blocks.1.res.norm2.running_var: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.encoder.conv_blocks.1.res.conv2.conv.weight: copying a param with shape torch.Size([128, 128, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3, 3]). size mismatch for first_stage_model.encoder.conv_blocks.1.res.conv2.conv.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.encoder.conv_blocks.2.down.conv.weight: copying a param with shape torch.Size([256, 128, 4, 4, 4]) from checkpoint, the shape in current model is torch.Size([128, 64, 4, 4, 4]). size mismatch for first_stage_model.encoder.conv_blocks.2.down.conv.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.encoder.conv_blocks.2.res.norm1.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.encoder.conv_blocks.2.res.norm1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.encoder.conv_blocks.2.res.norm1.running_mean: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.encoder.conv_blocks.2.res.norm1.running_var: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.encoder.conv_blocks.2.res.conv1.conv.weight: copying a param with shape torch.Size([256, 256, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3, 3]). size mismatch for first_stage_model.encoder.conv_blocks.2.res.conv1.conv.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.encoder.conv_blocks.2.res.norm2.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.encoder.conv_blocks.2.res.norm2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.encoder.conv_blocks.2.res.norm2.running_mean: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.encoder.conv_blocks.2.res.norm2.running_var: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.encoder.conv_blocks.2.res.conv2.conv.weight: copying a param with shape torch.Size([256, 256, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3, 3]). size mismatch for first_stage_model.encoder.conv_blocks.2.res.conv2.conv.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.encoder.conv_first.conv.weight: copying a param with shape torch.Size([32, 3, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([16, 3, 3, 3, 3]). size mismatch for first_stage_model.encoder.conv_first.conv.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([16]). size mismatch for first_stage_model.encoder.final_block.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.encoder.final_block.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.encoder.final_block.0.running_mean: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.encoder.final_block.0.running_var: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.final_block.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.final_block.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.final_block.0.running_mean: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.final_block.0.running_var: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.up.convt.weight: copying a param with shape torch.Size([256, 256, 4, 4, 4]) from checkpoint, the shape in current model is torch.Size([128, 128, 4, 4, 4]). size mismatch for first_stage_model.decoder.conv_blocks.0.up.convt.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.res1.norm1.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.res1.norm1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.res1.norm1.running_mean: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.res1.norm1.running_var: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.res1.conv1.conv.weight: copying a param with shape torch.Size([256, 256, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3, 3]). size mismatch for first_stage_model.decoder.conv_blocks.0.res1.conv1.conv.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.res1.norm2.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.res1.norm2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.res1.norm2.running_mean: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.res1.norm2.running_var: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.res1.conv2.conv.weight: copying a param with shape torch.Size([256, 256, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3, 3]). size mismatch for first_stage_model.decoder.conv_blocks.0.res1.conv2.conv.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.res2.norm1.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.res2.norm1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.res2.norm1.running_mean: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.res2.norm1.running_var: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.res2.conv1.conv.weight: copying a param with shape torch.Size([256, 256, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3, 3]). size mismatch for first_stage_model.decoder.conv_blocks.0.res2.conv1.conv.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.res2.norm2.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.res2.norm2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.res2.norm2.running_mean: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.res2.norm2.running_var: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.0.res2.conv2.conv.weight: copying a param with shape torch.Size([256, 256, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3, 3]). size mismatch for first_stage_model.decoder.conv_blocks.0.res2.conv2.conv.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.decoder.conv_blocks.1.up.convt.weight: copying a param with shape torch.Size([256, 128, 4, 4, 4]) from checkpoint, the shape in current model is torch.Size([128, 64, 4, 4, 4]). size mismatch for first_stage_model.decoder.conv_blocks.1.up.convt.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.1.res1.norm1.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.1.res1.norm1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.1.res1.norm1.running_mean: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.1.res1.norm1.running_var: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.1.res1.conv1.conv.weight: copying a param with shape torch.Size([128, 128, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3, 3]). size mismatch for first_stage_model.decoder.conv_blocks.1.res1.conv1.conv.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.1.res1.norm2.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.1.res1.norm2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.1.res1.norm2.running_mean: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.1.res1.norm2.running_var: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.1.res1.conv2.conv.weight: copying a param with shape torch.Size([128, 128, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3, 3]). size mismatch for first_stage_model.decoder.conv_blocks.1.res1.conv2.conv.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.1.res2.norm1.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.1.res2.norm1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.1.res2.norm1.running_mean: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.1.res2.norm1.running_var: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.1.res2.conv1.conv.weight: copying a param with shape torch.Size([128, 128, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3, 3]). size mismatch for first_stage_model.decoder.conv_blocks.1.res2.conv1.conv.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.1.res2.norm2.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.1.res2.norm2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.1.res2.norm2.running_mean: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.1.res2.norm2.running_var: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.1.res2.conv2.conv.weight: copying a param with shape torch.Size([128, 128, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3, 3]). size mismatch for first_stage_model.decoder.conv_blocks.1.res2.conv2.conv.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for first_stage_model.decoder.conv_blocks.2.up.convt.weight: copying a param with shape torch.Size([128, 64, 4, 4, 4]) from checkpoint, the shape in current model is torch.Size([64, 32, 4, 4, 4]). size mismatch for first_stage_model.decoder.conv_blocks.2.up.convt.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_blocks.2.res1.norm1.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_blocks.2.res1.norm1.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_blocks.2.res1.norm1.running_mean: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_blocks.2.res1.norm1.running_var: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_blocks.2.res1.conv1.conv.weight: copying a param with shape torch.Size([64, 64, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3, 3]). size mismatch for first_stage_model.decoder.conv_blocks.2.res1.conv1.conv.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_blocks.2.res1.norm2.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_blocks.2.res1.norm2.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_blocks.2.res1.norm2.running_mean: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_blocks.2.res1.norm2.running_var: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_blocks.2.res1.conv2.conv.weight: copying a param with shape torch.Size([64, 64, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3, 3]). size mismatch for first_stage_model.decoder.conv_blocks.2.res1.conv2.conv.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_blocks.2.res2.norm1.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_blocks.2.res2.norm1.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_blocks.2.res2.norm1.running_mean: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_blocks.2.res2.norm1.running_var: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_blocks.2.res2.conv1.conv.weight: copying a param with shape torch.Size([64, 64, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3, 3]). size mismatch for first_stage_model.decoder.conv_blocks.2.res2.conv1.conv.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_blocks.2.res2.norm2.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_blocks.2.res2.norm2.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_blocks.2.res2.norm2.running_mean: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_blocks.2.res2.norm2.running_var: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_blocks.2.res2.conv2.conv.weight: copying a param with shape torch.Size([64, 64, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3, 3]). size mismatch for first_stage_model.decoder.conv_blocks.2.res2.conv2.conv.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]). size mismatch for first_stage_model.decoder.conv_last.conv.weight: copying a param with shape torch.Size([3, 64, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([3, 32, 3, 3, 3]). size mismatch for first_stage_model.pre_vq_conv.conv.weight: copying a param with shape torch.Size([128, 256, 1, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 128, 1, 1, 1]). size mismatch for first_stage_model.pre_vq_conv.conv.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for first_stage_model.post_vq_conv.conv.weight: copying a param with shape torch.Size([256, 128, 1, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 256, 1, 1, 1]). size mismatch for first_stage_model.post_vq_conv.conv.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for first_stage_model.codebook.embeddings: copying a param with shape torch.Size([16384, 128]) from checkpoint, the shape in current model is torch.Size([16384, 256]). size mismatch for first_stage_model.codebook.z_avg: copying a param with shape torch.Size([16384, 128]) from checkpoint, the shape in current model is torch.Size([16384, 256]).

Can you Please tell me the which hyperparameter is creating the problem...

songweige commented 2 years ago

Oh... this is a good catch. In our experiments, we actually used n_hiddens = 32 for our experiments. We updated the code for release with halved n_hiddens by default but forgot to make corresponding change in the scripts. This might explain the gap you saw in the FVD. Thanks for catching that!

In terms of the experiment, I think using a trained transformer with the other independently trained VQGAN won't work since the learned latent spaces are different. I was suggesting training another transformer with our VQGAN checkpoint for debugging previously, which might confuse you.

In terms of the sampling, you might have already done it but I think when you removed --sample_every_n_frames 4, you also need to specify --sequence_length 16.

VIROBO-15 commented 2 years ago

Are you trying to say to set --sequence_length 64 during training the VQGAN and transformer ?

During testing I have set --sequence_length 64 after removing --sample_every_n_frames 4

songweige commented 2 years ago

Oh sorry, I meant during the testing, after removing --sample_every_n_frames 4, you need to set --sequence_length 16.

VIROBO-15 commented 2 years ago

After removing the ----sample_every_n_frames 4 and setting --sequence length 16 FVD = 296.21 KVD = 69.80 Before removing the ----sample_every_n_frames 4 and setting --sequence length 64 FVD = 3245.95 KVD = 1767.22

Current I am using the script - python ./scripts/sample_vqgan_transformer_short_videos.py --gpt_ckpt ./TATS/lightning_logs/version_3953207/checkpoints/best_checkpoint.ckpt --vqgan_ckpt ./TATS/lightning_logs/version_3942275/checkpoints/latest_checkpoint_prev.ckpt --save ./TATS/output/taichi --data_path ./TATS/data/taichi --batch_size 16 --top_k 2048 --top_p 0.8 --dataset taichi --compute_fvd --save_videos --resolution 128 --sequence_length 16

VIROBO-15 commented 2 years ago

How can we get the required number as in the given table after this process?

Current I am using the script - python ./scripts/sample_vqgan_transformer_short_videos.py --gpt_ckpt ./TATS/lightning_logs/version_3953207/checkpoints/best_checkpoint.ckpt --vqgan_ckpt ./TATS/lightning_logs/version_3942275/checkpoints/latest_checkpoint_prev.ckpt --save ./TATS/output/taichi --data_path ./TATS/data/taichi --batch_size 16 --top_k 2048 --top_p 0.8 --dataset taichi --compute_fvd --save_videos --resolution 128 --sequence_length 16

training script for vqgan - python ./scripts/train_vqgan.py --embedding_dim 256 --n_codes 16384 --n_hiddens 16 --downsample 4 8 8 --gpus 8 --sync_batchnorm --batch_size 2 --num_workers 32 --accumulate_grad_batches 6 --progress_bar_refresh_rate 500 --max_steps 2000000 --gradient_clip_val 1.0 --lr 3e-5 --data_path ./TATS/data/taichi --default_root_dir ./TATS --resolution 128 --sequence_length 16 --discriminator_iter_start 10000 --norm_type batch --perceptual_weight 4 --image_gan_weight 1 --video_gan_weight 1 --gan_feat_weight 4

training script for transformer:- python ./scripts/train_transformer.py --num_workers 32 --val_check_interval 0.5 --progress_bar_refresh_rate 500 --gpus 8 --sync_batchnorm --batch_size 3 --unconditional --vqvae ./TATS/lightning_logs/version_3942275/checkpoints/latest_checkpoint_prev.ckpt --data_path ./TATS/data/taichi --default_root_dir /proj/cvl/users/x_fahkh/mn/TATS --vocab_size 16384 --block_size 1024 --n_layer 24 --n_head 16 --n_embd 1024 --resolution 128 --sequence_length 16 --max_steps 2000000

songweige commented 2 years ago

Can you try to use the correct hyperparameters n_hiddens=32, sequence_length=64, and sample_every_n_frames=4 to train the VQGAN and transformer models on Taichi dataset. Maybe train VQGAN for 30k. Let me know if they give you the correct FVDs.

VIROBO-15 commented 2 years ago

Thank You SongWeiGe for your Response....

As you suggested I have made used all the correct hyper parameter but I am unable to get the required number.

I have analyzed and evaluated the working of VQGAN, from my analysis VQGAN is performing good but the problem is with transformer.

Training VQGAN - python ./scripts/train_vqgan.py --embedding_dim 128 --n_codes 16384 --n_hiddens 32 --downsample 4 8 8 --no_random_restart --gpus 8 --sync_batchnorm --batch_size 2 --num_workers 32 --accumulate_grad_batches 6 --progress_bar_refresh_rate 500 --max_steps 2000000 --gradient_clip_val 1.0 --lr 3e-5 --data_path ./TATS/data/taichi --default_root_dir ./TAT --resolution 128 --sequence_length 64 --discriminator_iter_start 10000 --norm_type batch --perceptual_weight 4 --image_gan_weight 1 --video_gan_weight 1 --gan_feat_weight 4 --sample_every_n_frames 4

Training Transformer - python ./scripts/train_transformer.py --num_workers 32 --val_check_interval 0.5 --progress_bar_refresh_rate 500 --gpus 8 --sync_batchnorm --batch_size 3 --unconditional --vqvae ./TAT/lightning_logs/version_3954144/checkpoints/latest_checkpoint.ckpt --data_path ./TATS/data/taichi --default_root_dir ./TAT --vocab_size 16384 --block_size 1024 --n_layer 24 --n_head 16 --n_embd 1024 --resolution 128 --sequence_length 64 --max_steps 2000000 --sample_every_n_frames 4

Testing Code - python ./scripts/sample_vqgan_transformer_short_videos.py --gpt_ckpt . ./TAT/lightning_logs/version_3954521/checkpoints/best_checkpoint.ckpt --vqgan_ckpt ./TAT/lightning_logs/version_3954144/checkpoints/latest_checkpoint.ckpt --save./TAT/output/taichi --data_path ./TATS/data/taichi --batch_size 16 --top_k 2048 --top_p 0.8 --dataset taichi --compute_fvd --save_videos --sample_every_n_frames 4 --resolution 128 --sequence_length 64

Results - FVD = 153.40 KVD = 19.57 Can You Please have the look......

songweige commented 2 years ago

The training scripts look good to me. I think you might want to do a sweep on the top_k and top_p hyperparameters in your testing script. For instance, you can try top_k=16384, top_p=0.92. If this affects your FVD, you may sweep more - say top_k=1024, 2048, 4096, 8192, and top_p=0.75, 0.8, 0.92, 0.98. Please do some visual check to see if there is obvious diversity and quality issues. Usually these hyperparameters could affect the quality a lot.

Could you let me know how long did you train your VQGAN and transformer? You could also consider train transformer for a bit longer and monitor the FVD changes.

VIROBO-15 commented 2 years ago

I have trained the transformer for - epoch=13-step=370000-train

songweige commented 2 years ago

I'm closing this issue for now. Please feel free to reopen this if you have more questions.