lucidrains / magvit2-pytorch

Implementation of MagViT2 Tokenizer in Pytorch
MIT License
501 stars 30 forks source link

The results for CausalConv3d #11

Open Epiphqny opened 8 months ago

Epiphqny commented 8 months ago

Hi @lucidrains , thanks for your awesome work! I used your causal conv implementation and trained on a video vqgan network. The results are as follows: Original clip sequence: 36500_image The reconstructed clip sequence: 36500_image_prime I've noticed that the reconstruction seems to heavily rely on the initial frame. As the sequence progresses, the clarity of the images appears to diminish, leading to a more blurring effect with each subsequent frame. Could you provide any insights into this phenomenon? Thank you for your time and assistance!

lucidrains commented 8 months ago

@Epiphqny wow Yuqing! those results do not look half bad! i'll have to think about your results a bit more. so this work builds upon the cvivit from the phenaki paper. in that paper, i believe they encode the first frame separately from the rest (to allow for single image pretraining). however, in this work, they decide to just pad on the left and use the same encoding for the first frame vs the rest. perhaps i can add the cvivit way for the sake of comparing the two

lucidrains commented 8 months ago

@Epiphqny once i circle back to this, also want to craft out a few more specialized discriminators (fourier domain as well as temporal)

lucidrains commented 8 months ago

@Epiphqny did you use LFQ or FSQ btw? could you share your hyperparameters?

lucidrains commented 8 months ago

@Epiphqny added it here if you want to run some experiments

Epiphqny commented 8 months ago

Hi @lucidrains, thanks for your prompt response! Actually, I didn't use the LFQ or FSQ, instead, I used the quantization in CVQ-VAE https://github.com/lyndonzheng/CVQ-VAE, and extend the 2D conv to 3D causal conv like magvit2. For the training parameters, I've followed the setup used in VQGAN and initialized the weights using a CVQ-VAE model prertrained on image data. I will trained the updated code of first frame and looking forward to the updated discriminator!

lucidrains commented 8 months ago

@Epiphqny ohh i see! i didn't know you only used the causal conv

i'm not sure what the issue is then

Epiphqny commented 8 months ago

@lucidrains Thanks for your response ! I will try more modules in this implementation and update the results later.

sijeh commented 4 months ago

@lucidrains Thanks for your response ! I will try more modules in this implementation and update the results later.

Hi @Epiphqny , Is there any progress on improving results?