In this paper, we show that simply using 2D VQ-GAN to encode each frame of a video can also generate temporal consistency videos and at the same time benefit from both image and video data.
In the paper, I believe you mean "temporally consistent" here. Subtle change in wording.
In the paper, I believe you mean "temporally consistent" here. Subtle change in wording.