thu-ml / controlvideo

Official implementation for "ControlVideo: Adding Conditional Control for One Shot Text-to-Video Editing"
Apache License 2.0
214 stars 15 forks source link

Shape Editing #11

Open Friedrich-M opened 1 year ago

Friedrich-M commented 1 year ago

Hi, thanks for your great work on video editing!

However, when I use the model to change the shape of a video object, based on the canny or hed condition, the result shows that it only changes the texture rather than the geometry.

The below video shows the failure case of shape editing on the penguin, whose target prompt is "a panda".

https://github.com/thu-ml/controlvideo/assets/85838942/7a209c39-3e6d-45ec-bb15-6e1e3593fd99

Friedrich-M commented 1 year ago

I would like to know if the model can only edit texture or stylize.

gracezhao1997 commented 1 year ago

Hi, Canny edge maps/ Depth Map/ HED boundary can't change the shape, and Pose control can change shape. We can combine them to utilize the advantage of different control types. Following is an example.

image
Friedrich-M commented 1 year ago

Thanks for your explanation. That's really helpful.

However, I'm interested in the implementation of multiple-control editing. Following Figure 2 you give, it seems that you first use Grounding-DINO and SAM to get the human mask, and then separately use canny control and pose control to get the editing results. So the final multiple control result is blended maybe as (1-mask)[canny_control] + (mask)[pose_control]. But during the process, the shape of the human is changed, and I wonder how you do the mix without blur.

Thank you again for your attention!