Inference code - Githubissues

jhj7905 commented 2 weeks ago

@jianzongwu When i ran the inference code, error occured like below. Can you tell the solution of this error? Thank you in advance

root@3f9488bc1e29:/mnt/vision-nas-01/hyunjo/video_generation_model/MotionBooth# python -m scripts.inference --script_path data/scripts/camera/waterfall.json --model_name lavie --num_samples 1 --start_shift_step 10 --max_shift_steps 10 /opt/conda/lib/python3.10/site-packages/diffusers/models/transformers/transformer_2d.py:34: FutureWarning: Transformer2DModelOutput is deprecated and will be removed in version 1.0.0. Importing Transformer2DModelOutput from diffusers.models.transformer_2d is deprecated and this will be removed in a future version. Please use from diffusers.models.modeling_outputs import Transformer2DModelOutput, instead. deprecate("Transformer2DModelOutput", "1.0.0", deprecation_message) matplotlib data path: /opt/conda/lib/python3.10/site-packages/matplotlib/mpl-data CONFIGDIR=/root/.config/matplotlib interactive is False platform is linux CACHEDIR=/root/.cache/matplotlib Traceback (most recent call last): File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/mnt/vision-nas-01/hyunjo/video_generation_model/MotionBooth/scripts/inference.py", line 148, in main(args) File "/mnt/vision-nas-01/hyunjo/video_generation_model/MotionBooth/scripts/inference.py", line 47, in main unet = model_class.from_pretrained( File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn return fn(*args, kwargs) File "/opt/conda/lib/python3.10/site-packages/diffusers/models/modeling_utils.py", line 740, in from_pretrained model = cls.from_config(config, unused_kwargs) File "/opt/conda/lib/python3.10/site-packages/diffusers/configuration_utils.py", line 260, in from_config model = cls(*init_dict) File "/opt/conda/lib/python3.10/site-packages/diffusers/configuration_utils.py", line 658, in inner_init init(self, args, **init_kwargs) File "/mnt/vision-nas-01/hyunjo/video_generation_model/MotionBooth/src/models/lavie/unet.py", line 180, in init down_block = get_down_block( File "/mnt/vision-nas-01/hyunjo/video_generation_model/MotionBooth/src/models/lavie/unet_blocks.py", line 75, in get_down_block raise ValueError(f"{down_block_type} does not exist.") ValueError: CrossAttnDownBlock2D does not exist.

jianzongwu commented 2 weeks ago

Hello, @jhj7905 I checked the code. It seems there is a missing step in converting LaVie's checkpoint. Specifically, you should also replace the lavie model's unet config file checkpoints/lavie/unet/config.json to below.

{
  "_class_name": "LaVieModel",
  "_diffusers_version": "0.25.0",
  "act_fn": "silu",
  "attention_head_dim": 8,
  "block_out_channels": [
    320,
    640,
    1280,
    1280
  ],
  "center_input_sample": false,
  "class_embed_type": null,
  "cross_attention_dim": 768,
  "down_block_types": [
    "CrossAttnDownBlock3D",
    "CrossAttnDownBlock3D",
    "CrossAttnDownBlock3D",
    "DownBlock3D"
  ],
  "downsample_padding": 1,
  "dual_cross_attention": false,
  "flip_sin_to_cos": true,
  "freq_shift": 0,
  "in_channels": 4,
  "layers_per_block": 2,
  "mid_block_scale_factor": 1,
  "mid_block_type": "UNetMidBlock3DCrossAttn",
  "norm_eps": 1e-05,
  "norm_num_groups": 32,
  "num_class_embeds": null,
  "only_cross_attention": false,
  "out_channels": 4,
  "resnet_time_scale_shift": "default",
  "sample_size": 64,
  "up_block_types": [
    "UpBlock3D",
    "CrossAttnUpBlock3D",
    "CrossAttnUpBlock3D",
    "CrossAttnUpBlock3D"
  ],
  "upcast_attention": false,
  "use_first_frame": false,
  "use_linear_projection": false,
  "use_relative_position": false
}

I have added this step to README.md and thanks for the reminding!

Please let me know if you have other questions.

jhj7905 commented 2 weeks ago

@jianzongwu Hello, Thank you for replying it quickly. By the way, when i ran the inference code like below

python -m scripts.inference \ --script_path data/scripts/customized_both/swim_coral.json \ --model_name lavie \ --customize_ckpt_path checkpoints/customized/lavie/plushie_happysad.pth \ --class_name "plushie happysad" \ --num_samples 1 \ --edit_scale 10.0 \ --max_amp_steps 15 \ --start_shift_step 10 \ --max_shift_steps 10 \ --base_seed 5

The quality of generated video was bad.... gs7 5-cs0--64-ss10-20-es10 0-as15-cus-plushie_happysad-s6

Can you tell me how to solve this issue?

jianzongwu commented 2 weeks ago

Hi, @jhj7905 I found there is another missing step. You should also replace the scheduler's config checkpoints/lavie/scheduler/scheduler_config.json to below.

{
  "_class_name": "DDPMScheduler",
  "_diffusers_version": "0.7.0.dev0",
  "beta_end": 0.02,
  "beta_schedule": "linear",
  "beta_start": 0.0001,
  "num_train_timesteps": 1000,
  "set_alpha_to_one": false,
  "skip_prk_steps": true,
  "steps_offset": 1,
  "trained_betas": null,
  "clip_sample": false
}

Also checkpoints/lavie/model_index.json to below.

{
  "_class_name": "StableDiffusionPipeline",
  "_diffusers_version": "0.2.2",
  "feature_extractor": [
    "transformers",
    "CLIPImageProcessor"
  ],
  "safety_checker": [
    "stable_diffusion",
    "StableDiffusionSafetyChecker"
  ],
  "scheduler": [
    "diffusers",
    "DDPMScheduler"
  ],
  "text_encoder": [
    "transformers",
    "CLIPTextModel"
  ],
  "tokenizer": [
    "transformers",
    "CLIPTokenizer"
  ],
  "unet": [
    "diffusers",
    "UNet2DConditionModel"
  ],
  "vae": [
    "diffusers",
    "AutoencoderKL"
  ]
}

We do these things because LaVie needs different configs compared to stable-diffusion-v1-4.

Please let me know if you have other questions.

jhj7905 commented 2 weeks ago

@jianzongwu Hello, Thank you for providing solution. When i confirm the way that you told, the quality of generated video is good like examples in video. By the way, Can you check the code with zeroscope model?

jianzongwu commented 2 weeks ago

The Zeroscope model does not need a checkpoint format convertion. So I think you can just use the downloaded version without any modification.

jianzongwu / MotionBooth

Inference code #2