Closed jhj7905 closed 2 weeks ago
Hello, @jhj7905
I checked the code. It seems there is a missing step in converting LaVie's checkpoint. Specifically, you should also replace the lavie model's unet config file checkpoints/lavie/unet/config.json
to below.
{
"_class_name": "LaVieModel",
"_diffusers_version": "0.25.0",
"act_fn": "silu",
"attention_head_dim": 8,
"block_out_channels": [
320,
640,
1280,
1280
],
"center_input_sample": false,
"class_embed_type": null,
"cross_attention_dim": 768,
"down_block_types": [
"CrossAttnDownBlock3D",
"CrossAttnDownBlock3D",
"CrossAttnDownBlock3D",
"DownBlock3D"
],
"downsample_padding": 1,
"dual_cross_attention": false,
"flip_sin_to_cos": true,
"freq_shift": 0,
"in_channels": 4,
"layers_per_block": 2,
"mid_block_scale_factor": 1,
"mid_block_type": "UNetMidBlock3DCrossAttn",
"norm_eps": 1e-05,
"norm_num_groups": 32,
"num_class_embeds": null,
"only_cross_attention": false,
"out_channels": 4,
"resnet_time_scale_shift": "default",
"sample_size": 64,
"up_block_types": [
"UpBlock3D",
"CrossAttnUpBlock3D",
"CrossAttnUpBlock3D",
"CrossAttnUpBlock3D"
],
"upcast_attention": false,
"use_first_frame": false,
"use_linear_projection": false,
"use_relative_position": false
}
I have added this step to README.md and thanks for the reminding!
Please let me know if you have other questions.
@jianzongwu Hello, Thank you for replying it quickly. By the way, when i ran the inference code like below
python -m scripts.inference \ --script_path data/scripts/customized_both/swim_coral.json \ --model_name lavie \ --customize_ckpt_path checkpoints/customized/lavie/plushie_happysad.pth \ --class_name "plushie happysad" \ --num_samples 1 \ --edit_scale 10.0 \ --max_amp_steps 15 \ --start_shift_step 10 \ --max_shift_steps 10 \ --base_seed 5
The quality of generated video was bad....
Can you tell me how to solve this issue?
Hi, @jhj7905
I found there is another missing step. You should also replace the scheduler's config checkpoints/lavie/scheduler/scheduler_config.json
to below.
{
"_class_name": "DDPMScheduler",
"_diffusers_version": "0.7.0.dev0",
"beta_end": 0.02,
"beta_schedule": "linear",
"beta_start": 0.0001,
"num_train_timesteps": 1000,
"set_alpha_to_one": false,
"skip_prk_steps": true,
"steps_offset": 1,
"trained_betas": null,
"clip_sample": false
}
Also checkpoints/lavie/model_index.json
to below.
{
"_class_name": "StableDiffusionPipeline",
"_diffusers_version": "0.2.2",
"feature_extractor": [
"transformers",
"CLIPImageProcessor"
],
"safety_checker": [
"stable_diffusion",
"StableDiffusionSafetyChecker"
],
"scheduler": [
"diffusers",
"DDPMScheduler"
],
"text_encoder": [
"transformers",
"CLIPTextModel"
],
"tokenizer": [
"transformers",
"CLIPTokenizer"
],
"unet": [
"diffusers",
"UNet2DConditionModel"
],
"vae": [
"diffusers",
"AutoencoderKL"
]
}
We do these things because LaVie needs different configs compared to stable-diffusion-v1-4.
Please let me know if you have other questions.
@jianzongwu Hello, Thank you for providing solution. When i confirm the way that you told, the quality of generated video is good like examples in video. By the way, Can you check the code with zeroscope model?
The Zeroscope model does not need a checkpoint format convertion. So I think you can just use the downloaded version without any modification.
@jianzongwu When i ran the inference code, error occured like below. Can you tell the solution of this error? Thank you in advance
root@3f9488bc1e29:/mnt/vision-nas-01/hyunjo/video_generation_model/MotionBooth# python -m scripts.inference --script_path data/scripts/camera/waterfall.json --model_name lavie --num_samples 1 --start_shift_step 10 --max_shift_steps 10 /opt/conda/lib/python3.10/site-packages/diffusers/models/transformers/transformer_2d.py:34: FutureWarning:
main(args)
File "/mnt/vision-nas-01/hyunjo/video_generation_model/MotionBooth/scripts/inference.py", line 47, in main
unet = model_class.from_pretrained(
File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, kwargs)
File "/opt/conda/lib/python3.10/site-packages/diffusers/models/modeling_utils.py", line 740, in from_pretrained
model = cls.from_config(config, unused_kwargs)
File "/opt/conda/lib/python3.10/site-packages/diffusers/configuration_utils.py", line 260, in from_config
model = cls(*init_dict)
File "/opt/conda/lib/python3.10/site-packages/diffusers/configuration_utils.py", line 658, in inner_init
init(self, args, **init_kwargs)
File "/mnt/vision-nas-01/hyunjo/video_generation_model/MotionBooth/src/models/lavie/unet.py", line 180, in init
down_block = get_down_block(
File "/mnt/vision-nas-01/hyunjo/video_generation_model/MotionBooth/src/models/lavie/unet_blocks.py", line 75, in get_down_block
raise ValueError(f"{down_block_type} does not exist.")
ValueError: CrossAttnDownBlock2D does not exist.
Transformer2DModelOutput
is deprecated and will be removed in version 1.0.0. ImportingTransformer2DModelOutput
fromdiffusers.models.transformer_2d
is deprecated and this will be removed in a future version. Please usefrom diffusers.models.modeling_outputs import Transformer2DModelOutput
, instead. deprecate("Transformer2DModelOutput", "1.0.0", deprecation_message) matplotlib data path: /opt/conda/lib/python3.10/site-packages/matplotlib/mpl-data CONFIGDIR=/root/.config/matplotlib interactive is False platform is linux CACHEDIR=/root/.cache/matplotlib Traceback (most recent call last): File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/mnt/vision-nas-01/hyunjo/video_generation_model/MotionBooth/scripts/inference.py", line 148, in