zlai0 / VideoAutoencoder

Video Autoencoder: self-supervised disentanglement of 3D structure and motion (ICCV 2021). Website: https://zlai0.github.io/VideoAutoencoder/
176 stars 15 forks source link

About Video Following #2

Closed dichen-cd closed 3 years ago

dichen-cd commented 3 years ago

Fantastic work! Thank you for sharing the code!

I was trying to play around with a video following demo similar to the ones showing on your project page. The appearance image I'm using is Vincent van Gogh’s bedroom and the motion clip is trimmed from this video from 2:03 to 2:07. The models are resumed from the provided re10k.ckpt.

Since there's no script for a video following demo, I slightly changed test_re10k.py, within which the clip in line 85 is changed with the motion clip mentioned above. And the scene_rep in line 96 is changed with the encoding of the appearance image. Am I doing it right?

The result is not quite satisfying though. The trajectory estimated by the pose network is not correct. The appearance of the generated video also gets blurry quickly.

Could you provide any suggestions on how to perform video following? Such as the assets used (appearance image, motion clip and checkpoints) and some critical hyper-parameters (frame_limit, fps, etc.)

Thank you.

zlai0 commented 3 years ago

Hi Dean, Thanks for your interest! The video following task uses the same hyper-parameters/checkpoint as the video synthesis task. The motion clips we used come from the RealEstate10K dataset and your appearance image looks good to me. Could you first 1. double-check your image and clip are pre-processed the same way as our dataloader does. 2. try a different pair of image+clip to see if the problem is solved.

dichen-cd commented 3 years ago

Hi @zlai0 . Thanks for the quick reply 😊

It turns out that I forgot to resize the appearance image and video to 256 x 256. The problem is fixed now.