Open Don-Chad opened 1 year ago
I cherry picked awesome idea from https://github.com/dajes/AnimateDiff. It in devel branch. Still working with it.
PR: https://github.com/guoyww/AnimateDiff/pull/25
Changing pe size needs to retrain model. Too expensive for me.
Yes this combination would be a perfect approach! I would be happy to do new training's and provide the GPU power for it. We could also have smaller models initially.
Would you be able to make a model which does 52 motion frames? Would be very dope to have longer video's! @tumurzakov
@Don-Chad I increased to 48 (24*2) by doubling pe tensors from original module and trained 1000 steps. It works well. It better than train from stretch.
Main problem not in gpu power but in dataset.
Wow! Would you please want to share the pipeline_animation which is doubled? (sorry I cannot find how to do this..)
I would love to work on the dataset. I have a lot of good varied content with labels. Happy to share a new motion module.
@Don-Chad very simple. Code in devel branch.
Trained 96 frames on A100 for 1000 steps (20 minutes). It took 21GB VRAM. It seems on A100 can be trained up to 184 frames. Infer on A100 took 20GB VRAM.
But on that frame count could be problems with pe. In AnimateDiff pe got from NLP transformer. Possibly we could try ViT positional encodings there to encode longer videos
just for fun, 48 frames on 96 frame model
Thanks kindly for sharing! Just one line makes a difference :-)
Good to see it works. Let me give it a try.
@tumurzakov What difference do you think ViT can make in this regard for PE?
i cant seem to use the motion_module_pe_multiplier feature
motion_module: models\Motion_Module\mmv1.5.pth
output_dir: models\Motion_Module\fff2
train_data:
video_path: data/fff2.mp4
prompt: girl
n_sample_frames: 48
width: 512
height: 512
sample_start_idx: 0
sample_frame_rate: 1 #rate of sampler (how many frames it skips like sample_frame_rate 4 would make the loop +4 frames in front)
validation_data:
prompts:
- girl
video_length: 48
temporal_context: 200
width: 512
height: 512
num_inference_steps: 20
guidance_scale: 5
use_inv_latent: true
num_inv_steps: 40
learning_rate: 3.0e-05
train_batch_size: 1
max_train_steps: 1000
checkpointing_steps: 100
validation_steps: 100
train_whole_module: false
trainable_modules:
- to_q
seed: 34
mixed_precision: fp16
use_8bit_adam: false
gradient_checkpointing: true
enable_xformers_memory_efficient_attention: true
motion_module_pe_multiplier: 2
File "G:\tuneavid\AnimateDiff\train.py", line 417, in <module>
main(**OmegaConf.load(args.config))
File "G:\tuneavid\AnimateDiff\train.py", line 133, in main
missing, unexpected = unet.load_state_dict(motion_module_state_dict, strict=False)
File "G:\anaconda3\envs\tuneavid\lib\site-packages\torch\nn\modules\module.py", line 2041, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for UNet3DConditionModel:
size mismatch for down_blocks.0.motion_modules.0.temporal_transformer.transformer_blocks.0.attention_blocks.0.pos_encoder.pe: copying a param with shape torch.Size([1, 48, 320]) from checkpoint, the shape in current model is torch.Size([1, 24, 320]).
Here is my config for 264 frames
pretrained_model_path: /content/animatediff/models/StableDiffusion/
motion_module: /content/animatediff/models/Motion_Module/mm_sd_v15.ckpt
motion_module_pe_multiplier: 11
inference_config_path: /content/drive/MyDrive/AI/video/videos/couplet2/train-full-256/valid.yaml
start_global_step: 0
output_dir: /content/drive/MyDrive/AI/video/videos/couplet2/train-full-256
dataset_class: FramesDataset
train_data:
samples_dir: /content/drive/MyDrive/AI/video/videos/couplet2/dataset256
prompt_map_path: /content/drive/MyDrive/AI/video/videos/couplet2/prompt_map.json
video_length: 264
width: 480
height: 272
validation_data:
prompts:
- standing face girl
video_length: 264
width: 480
height: 272
temporal_context: 264
num_inference_steps: 10
guidance_scale: 12.5
use_inv_latent: true
num_inv_steps: 50
learning_rate: 3.0e-05
train_batch_size: 1
max_train_steps: 2000
checkpointing_steps: 100
validation_steps: 10000
train_whole_module: true
trainable_modules:
- to_q
seed: 33
mixed_precision: fp16
use_8bit_adam: false
gradient_checkpointing: true
enable_xformers_memory_efficient_attention: true
take a look at train_data
section
train_data:
samples_dir: /content/drive/MyDrive/AI/video/videos/couplet2/dataset256
prompt_map_path: /content/drive/MyDrive/AI/video/videos/couplet2/prompt_map.json
video_length: 264 <---- missed
width: 480
height: 272
It works!
Thanks for sharing this.
Any idea how we could change the video lenght to something like 32 or 48? Longer motion would be great. At the moment it seems to be capped at 24.
It would be fine to start over, instead of using with the existing motion data set.
Error I am getting now is:
File "g:\content\animatediff\animatediff\models\motion_module.py", line 244, in forward x = x + self.pe[:, :x.size(1)] RuntimeError: The size of tensor a (32) must match the size of tensor b (24) at non-singleton dimension 1