jianzongwu / MotionBooth

[NeurIPS 2024 Spotlight] The official implement of research paper "MotionBooth: Motion-Aware Customized Text-to-Video Generation"
93 stars 7 forks source link

About box control #6

Open Junoh-Kang opened 3 months ago

Junoh-Kang commented 3 months ago

Dear authors,

I have a question and an issue.

Q1. Is it possible to control motions with box for non-fine-tuned model? Or is only camera motion controllable? I experience strange objects when I use bbox option for pre-trained zeroscope.

I1. When I use num_samples > 1, it seems bbox value changes. It seems pipeline_motionbooth changes bbox values with camera motion.

jianzongwu commented 3 months ago

Answer For Q1:

We have tested controlling motions with box for non-fine-tuned model. In some cases, it is actually able to control the movement. However, you should set the controlling strength lower and test more samples, or the quality will be poor

Answer for I1:

In our implementation, the camera does influence the box sequence. You can look through pipeline_motionbooth, the box motion and camera motion all are based on absolute coordinate system. So for example, if the box are set stable and camera moves to the left, the box in the video will move to the right, accordingly. The reason we do so is that latent shift is performed during the middle steps, where the approximate layout of objects are already fixed. If we do not move the object according to the shifted latent, the result quality might be lower. You can try it by remove the specific code in def latent_shift() from pipeline_motionbooth

Junoh-Kang commented 3 months ago

I think there is a misleading part in I1. The point is that when I use num_samples >1, the bbox values for first and second videos are different. I think it may not be an intended result. You can check this by drawing ani every iteration. As a quick fix, I deepcopied initial bbox.

jianzongwu commented 3 months ago

You mean the bbox passed into the pipeline forward function to bbox.deepcopy()?

Junoh-Kang commented 3 months ago

No, just at the inference.py, just before pipeline. ` bbox_ref = bbox.deepcopy() for i in num_samples: bbox = bbox_ref.deepcopy() pipe(bbox,...)

the values in bbox and bbox_ref will be different now.

`

jianzongwu commented 2 months ago

I think the problem might be the bbox value being changed in the pipeline. So passing bbox.deepcopy() into the pipeline might be ok. Your method is also ok.