mayuelala / FollowYourPose

[AAAI 2024] Follow-Your-Pose: This repo is the official implementation of "Follow-Your-Pose : Pose-Guided Text-to-Video Generation using Pose-Free Videos"
https://follow-your-pose.github.io/
MIT License
1.23k stars 87 forks source link

What parts of Cross-Frame Attention have been reformed in your project relative to Tune-A-Video? #20

Closed XuejiFang closed 1 year ago

XuejiFang commented 1 year ago

As the author mentioned in Abstract: In the second stage, we finetune the motion of the above network via a pose-free video dataset by adding the learnable temporal self-attention and reformed cross-frame self-attention blocks.

Can I understand the cross-frame attn mentioned in your paper is the SparseCausalAttention Class in your opened-source codes, which is the same as the SparseCausalAttention Class writen in Tune-A-Video? In this case, how does the Cross-Frame Attn reformed in your project? Which part of the code is embodied?

mayuelala commented 1 year ago

Yes, It is the same as the SparseCausalAttention Class writen in Tune-A-Video. We finetune the SCA on HDVILA and add lora to keep consistancy. As for code, you could find it in here