showlab / Tune-A-Video

[ICCV 2023] Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
https://tuneavideo.github.io
Apache License 2.0
4.15k stars 377 forks source link

Pose Control Implementation #54

Closed Shr1ftyy closed 1 year ago

Shr1ftyy commented 1 year ago

Hello, I was wondering how exactly you guys managed to perform "pose control" with Tune-A-Video? To my knowledge, the process hasn't been outlined in the Tune-A-Video paper.

Screen Shot 2023-04-17 at 4 13 21 pm

zhangjiewu commented 1 year ago

hi @Shr1ftyy, we have some discussions on the pose control part in the sec. 4 of the paper. intuitively, we use T2I-Adapter as the pretrained T2I model, and do the editing based on pose condition.

Shr1ftyy commented 1 year ago

Thanks for getting back! I realized I was reading an older version of the paper 🤦🏾‍♂️ . I'll close this issue with this comment.

Shr1ftyy commented 1 year ago

Hello again, I was wondering if you have any plans to release examples which integrate ControlNet-OpenPose or T2I, etc. with Tune-A-Video for inference. If so, could you provide an estimate on when they may be released?

Thanks.

zhangjiewu commented 1 year ago

hi @Shr1ftyy, it was actually in my todo list. however, i was quite packed in the past few weeks, and did not manage to do this. feel free to open a PR if you want to contribute.

fyi, i spotted some following works (e.g., FollowYourPose) that have implemented the pose control, which is quite similar to ours.

Shr1ftyy commented 1 year ago

hi @Shr1ftyy, we have some discussions on the pose control part in the sec. 4 of the paper. intuitively, we use T2I-Adapter as the pretrained T2I model, and do the editing based on pose condition.

Hi, In the paper, you claim that: " Our method can also be integrated with conditional T2I models like T2I-Adapter [29] and ControlNet [52], to enable diverse controls on the generated videos at no extra training cost. " My intuition + the above statement has left me to assume that one does not need to re-finetune a pretrained T2I-Adapter (that's already been trained on Stable Diffusion 1.5 for example) to control a Stable Diffusion 1.5 model that's been modified and fine-tuned as per. the Tune-A-Video paper to achieve the kind of results displayed below (coherent pose-guided imagery):.

Screen Shot 2023-05-06 at 10 40 14 pm

Is this assumption correct?

Thanks again.

zhangjiewu commented 1 year ago

yes, there's no need to fine-tune the adapter (control) part. simply focus on fine-tuning the SD1.5 as Tune-A-Video.

Tinaa23 commented 1 month ago

@zhangjiewu Hi. Amazing work. Is it possible to provide example code on how can we use control mechanism (adapter as is mentioned in the paper) with your model? It is not clear how we can connect the pretrained T2I with pose control to Tune A video.