showlab / MotionDirector

[ECCV 2024 Oral] MotionDirector: Motion Customization of Text-to-Video Diffusion Models.
https://showlab.github.io/MotionDirector/
Apache License 2.0
853 stars 54 forks source link

Why my result is such unclear? any body have the experiments? #7

Closed RYHSmmc closed 11 months ago

ruizhaocv commented 11 months ago

Could you please provide your running commend and results here? Then we can go through them and find the problem.

XiaominLi1997 commented 11 months ago

I also meet the same problem.

  1. given: Prompt: A person is riding a bicycle past the Eiffel Tower. seed: 2023 ckpt: ./outputs/train/train_2023-12-02T13-39-36/ (https://huggingface.co/Yhyu13/MotionDirector_LoRA) I got the following result, no person exits in the video. https://github.com/showlab/MotionDirector/assets/25433111/9e1903e3-d13d-4dfa-a774-9b45d55d364d

  2. given: Prompt: A person is riding a bicycle past the Eiffel Tower. seed: 7192280 ckpt: ./outputs/train/train_2023-12-02T13-39-36/ (https://huggingface.co/Yhyu13/MotionDirector_LoRA) I got the following results, which is unclear. https://github.com/showlab/MotionDirector/assets/25433111/e2728118-33d1-4aa3-9e8b-9d6ff9b7a66d

ruizhaocv commented 11 months ago

Hi Xiaomin. Thanks for the feedback. How about other checkpoints? Like (https://github.com/showlab/MotionDirector#motiondirector-trained-on-a-single-video). Generally, setting the same seed as listed in the readme will generate the same result as shown.

XiaominLi1997 commented 11 months ago

Hi Xiaomin. Thanks for the feedback. How about other checkpoints? Like (https://github.com/showlab/MotionDirector#motiondirector-trained-on-a-single-video). Generally, setting the same seed as listed in the readme will generate the same result as shown.

Yep, results from training on a single video are the same. thx

ruizhaocv commented 11 months ago

Nice. Maybe I confused the checkpoints of the riding bicycle. Will check that.

XiaominLi1997 commented 11 months ago

Nice. Maybe I confused the checkpoints of the riding bicycle. Will check that.

hi, I found a new problem, a train on a single video (prompt: A person is skateboarding)

Given the same seed=6668889 and prompt="A panda is skateboarding." during both training and inference.

  1. sample a video during training with ckpt-300 the result is pretty good. https://github.com/showlab/MotionDirector/assets/25433111/a35240a6-4b65-41d8-906c-b15f1f300741

however,

  1. sample a video during inference with ckpt-300 the performance is bad.

https://github.com/showlab/MotionDirector/assets/25433111/ec55528f-6a94-49af-84e6-f79134d4dc58

Could you please check the inference code or check the reason (maybe the hyper-parameters)? I and my co-worker met the same problem.

Inference hyper-parameters I used: "args": [ "--model", "/15764332239/pretrained_models/text-to-video-ms-1.7b", "--prompt", "A panda is skateboarding.", "--checkpoint_folder", "./outputs/train/skateboard-single-video", "--checkpoint_index", "300", "--noise_prior", "0.5", "--seed", "6668889" ],

training hyper-params: `pretrained_model_path: "/15764332239/pretrained_models/text-to-video-ms-1.7b"

output_dir: "./outputs/train"

dataset_types:

cache_latents: True

cached_latent_dir: null

use_unet_lora: True

lora_unet_dropout: 0.1

save_pretrained_model: False lora_rank: 32

train_data:

width: 384 height: 384

use_bucketing: True

sample_start_idx: 1 fps: 8

frame_step: 1

n_sample_frames: 16

single_video_path: "./test_data/skateboarding-front/708-75070.avi"

single_video_prompt: "A person is skateboarding."

validation_data:

prompt:

learning_rate: 5e-4

adam_weight_decay: 1e-2

max_train_steps: 300

checkpointing_steps: 50

validation_steps: 50

seed: 6668889

mixed_precision: "fp16"

gradient_checkpointing: False text_encoder_gradient_checkpointing: False

enable_xformers_memory_efficient_attention: True

enable_torch_2_attn: True`

ruizhaocv commented 11 months ago

How about inference with checkpoint_index=150 look like?

XiaominLi1997 commented 11 months ago

How about inference with checkpoint_index=150 look like?

So good! Why does this phenomenon occur?

https://github.com/showlab/MotionDirector/assets/25433111/93f4ca42-4e9b-4a38-a83c-54bbffc2324e

Sample during training with checkpoint_index=150:

https://github.com/showlab/MotionDirector/assets/25433111/e65d88ec-0f69-49e3-a837-4b3783c9f8e0

above two results are different.

ruizhaocv commented 11 months ago

For faster convergence, we set a large learning rate, which may cause instability in the late training steps. If you want a more stable but slower training, you can try to reduce the learning rate. Enjoy exploring the optimal hyperparameters for your own training task.

ruizhaocv commented 11 months ago

Setting the seed fixed for inference will make sure to generate the same results. However, setting the same random seed does not mean you will get the exactly same results in the inference stage and training stage. Because every time the seed is called in the training stage, it will change.

XiaominLi1997 commented 11 months ago

Setting the seed fixed for inference will make sure to generate the same results. However, setting the same random seed does not mean you will get the exactly same results in the inference stage and training stage. Because every time the seed is called in the training stage, it will change.

Thanks, I just mistakenly thought the seed below as the validation seed. Actually, it used in training.

image

Thanks again for your nice reply.

ruizhaocv commented 11 months ago

Thanks for pointing this out. I have deleted this confusing item.