neggles / animatediff-cli

a CLI utility/library for AnimateDiff stable diffusion generation
Apache License 2.0
262 stars 132 forks source link

Can we use mm_sd_v15_v2? #29

Closed doogyhatts closed 1 year ago

doogyhatts commented 1 year ago

I managed to get it setup and running on Colab. I did change the motion-module to mm_sd_v15_v2.ckpt. But once I try to execute the ToonYou script, the process terminates at "Using generation config".
This does not happen when I am using mm_sd_v15.ckpt.

doogyhatts commented 1 year ago

I found that the support file is in the prompt-travel fork.

neggles commented 1 year ago

I managed to get it setup and running on Colab. I did change the motion-module to mm_sd_v15_v2.ckpt. But once I try to execute the ToonYou script, the process terminates at "Using generation config". This does not happen when I am using mm_sd_v15.ckpt.

I actually added the mm_sd_v15_v2 checkpoint a couple of weeks ago, and a day or two ago I added it to the list of motion modules that are auto-downloaded; if you put models/motion-module/mm_sd_v15_v2.safetensors in the "motion_module" part of the config .json that should just work.

doogyhatts commented 1 year ago

Thanks for the reply! I think I know why the process was terminated early. There was insufficient system ram in Colab-free.

doogyhatts commented 1 year ago

Ok I tried on a machine with 32gb sys ram and I loaded the ToonYou & mistoon_anime models. But when I reached the part on Unet, both models threw the same error.

Traceback (most recent call last) │ /workspace/animatediff-cli/src/animatediff/cli.py:276 in generate

│ 273 │ global pipeline
│ 274 │ global last_model_path
│ 275 │ if pipeline is None or last_model_path != model_config.base.resolve():
│ ❱ 276 │ │ pipeline = create_pipeline(
│ 277 │ │ │ base_model=base_model_path,
│ 278 │ │ │ model_config=model_config,
│ 279 │ │ │ infer_config=infer_config,

│ /workspace/animatediff-cli/src/animatediff/generate.py:58 in create_pipeline

│ 55 │ logger.info("Loading VAE...")
│ 56 │ vae: AutoencoderKL = AutoencoderKL.from_pretrained(base_model, subfolder="vae")
│ 57 │ logger.info("Loading UNet...")
│ ❱ 58 │ unet: UNet3DConditionModel = UNet3DConditionModel.from_pretrained_2d(
│ 59 │ │ pretrained_model_path=base_model,
│ 60 │ │ motion_module_path=motion_module,
│ 61 │ │ subfolder="unet",

│ /workspace/animatediff-cli/src/animatediff/models/unet.py:555 in from_pretrained_2d

│ 552 │ │ state_dict.update(motion_state_dict)
│ 553 │ │
│ 554 │ │ # load the weights into the model
│ ❱ 555 │ │ m, u = model.load_state_dict(state_dict, strict=False)
│ 556 │ │ logger.debug(f"### missing keys: {len(m)}; \n### unexpected keys: {len(u)};")
│ 557 │ │
│ 558 │ │ params = [p.numel() if "temporal" in n else 0 for n, p in model.named_parameters

│ /workspace/animatediff-cli/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py:2041 in │ load_state_dict

│ 2038 │ │ │ │ │ │ ', '.join('"{}"'.format(k) for k in missing_keys)))
│ 2039 │ │
│ 2040 │ │ if len(error_msgs) > 0:
│ ❱ 2041 │ │ │ raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
│ 2042 │ │ │ │ │ │ │ self.class.name, "\n\t".join(error_msgs)))
│ 2043 │ │ return _IncompatibleKeys(missing_keys, unexpected_keys)
│ 2044

RuntimeError: Error(s) in loading state_dict for UNet3DConditionModel: size mismatch for down_blocks.0.motion_modules.0.temporal_transformer.transformer_blocks.0.attention_blocks.0.pos_encoder.pe: copying a param with shape torch.Size([1, 32, 320]) from checkpoint, the shape in current model is torch.Size([1, 24, 320]). size mismatch for down_blocks.0.motion_modules.0.temporal_transformer.transformer_blocks.0.attention_blocks.1.pos_encoder.pe: copying a param with shape torch.Size([1, 32, 320]) from checkpoint, the shape in current model is torch.Size([1, 24, 320]). size mismatch for down_blocks.0.motion_modules.1.temporal_transformer.transformer_blocks.0.attention_blocks.0.pos_encoder.pe: copying a param with shape torch.Size([1, 32, 320]) from checkpoint, the shape in current model is torch.Size([1, 24, 320]). size mismatch for down_blocks.0.motion_modules.1.temporal_transformer.transformer_blocks.0.attention_blocks.1.pos_encoder.pe: copying a param with shape torch.Size([1, 32, 320]) from checkpoint, the shape in current model is torch.Size([1, 24, 320]).

neggles commented 1 year ago

so it turns out while I thought i'd tested the v2 module, I had not tested the v2 module, and it has slightly bigger dims.

Welp.

Lemme fix that...

neggles commented 1 year ago

Fixed in 1911af251aba694b4fd0adc8458ca43fb4f46863 😄