songweige / TATS

Official PyTorch implementation of TATS: A Long Video Generation Framework with Time-Agnostic VQGAN and Time-Sensitive Transformer (ECCV 2022)
MIT License
267 stars 17 forks source link

The Training of Interpolation Transformer #10

Closed kangzhao2 closed 2 years ago

kangzhao2 commented 2 years ago

Dear author:

In the training of Interpolation Transformer, given the latent space is 5 16 16, I found the first 16 16 and the last 16 16 tokens join the gradient propagation. But in the inference of Interpolation Transformer, the first and last 16 16 tokens are given. So, in my opinion, the first 16 16 and the last 16 * 16 tokens should not take part in gradient back-propagation during the training process? Please correct me if I'm wrong.

Kang

songweige commented 2 years ago

Hi Kang, good point. I think you are right. I suspect that if you mask out the loss on the initial and last frames, it shouldn't affect the model performance.