omerbt / TokenFlow

Official Pytorch Implementation for "TokenFlow: Consistent Diffusion Features for Consistent Video Editing" presenting "TokenFlow" (ICLR 2024)
https://diffusion-tokenflow.github.io
MIT License
1.58k stars 137 forks source link

Is only the last layer of the edited frame processed? #6

Open gladzhang opened 1 year ago

gladzhang commented 1 year ago

Thanks for your nice work! I have two questions. The first question, the paper mentioned that each layer of the key frames has been processed. So, when editing the original video frame, is every layer also processed, or is only the last layer processed. Second question, I understand that the processing of video frames should be carried out step by step, and the result of the processing of the previous step will be output as the next step. So according to the understanding of the paper, all frames should be processed in each step, is it right?

I look forward to your reply. Thank you again.

omerbt commented 1 year ago

Thanks! First: all layers are processed, the reason is that there are skip connections in the architecture, which make it different from just operating on the last layer. Regarding the second question: all frames are processed in each step (but we split them into batches during the forward pass to save memory).