Open Cubey42 opened 1 year ago
I've tried a couple different methods and schedulers and learning rates but nothing seems to have any noticeable impact, to the point where I'm not entirely sure its training the motion module at all. do you have any examples of successful finetuning?
Yes.
Trained 1000 steps on 2 videos temporal_position_encoding_max_len=32 and it learned motion. Not perfect, but it works
Yes.
Trained 1000 steps on 2 videos temporal_position_encoding_max_len=32 and it learned motion. Not perfect, but it works
Hi, here you just tune the 'to_q' param or tune the whole motion module params?
Whole module
Thanks for your reply. I tried to finetune the motion module following your training codes, while extending it to train on my video dataset instead of one video. I found that as the results of the fine-tuned training model were getting worse, it eventually produced unreasonable clips. I would like to ask if you used this code for fine-tuning, and how were some hyperparameters set, such as learning rate?
I'm wrong. To-q only @patrolli
I'm working on problem to stylize one video and specially overfit module with one motion. So we solving different problems. I'm still experimenting.
Copy, tuning the entire motion module always produces worse results, which significantly harms the oringinal performance. I have also tried tuning only 'to_q,' but its results seem to be only slightly better than tuning the entire module :(
Yes.
Trained 1000 steps on 2 videos temporal_position_encoding_max_len=32 and it learned motion. Not perfect, but it works
interesting, could I copy your .yaml for testing?
Sure, i'll attach it tomorrow.
Hey @tumurzakov , should the frames numbers of the videos be the same in the training dataset?
No, dataset loader will get as much frames as it needs. Main requirement that dataset videos must contain MORE frames than video_length
@Cubey42 nothing special. It configs used for training 96 frame model
pretrained_model_path: /content/animatediff/models/StableDiffusion/
motion_module: /content/drive/MyDrive/AI/video/videos/intro2/train/mm-100.pth
motion_module_pe_multiplier: 1
inference_config_path: /content/drive/MyDrive/AI/video/videos/intro2/infer/valid.yaml
start_global_step: 0
output_dir: /content/drive/MyDrive/AI/video/videos/intro2/train
train_data:
video_path:
- /content/drive/MyDrive/AI/video/videos/intro2/dataset/0.mp4
- /content/drive/MyDrive/AI/video/videos/intro2/dataset/1.mp4
prompt:
- fly over mist
- fly over mist
n_sample_frames: 96
width: 480
height: 272
sample_start_idx: 0
sample_frame_rate: 1
validation_data:
prompts:
- fly over mist
- fly over mist
video_length: 96
width: 480
height: 272
temporal_context: 96
num_inference_steps: 20
guidance_scale: 12.5
use_inv_latent: true
num_inv_steps: 50
learning_rate: 3.0e-05
train_batch_size: 1
max_train_steps: 1000
checkpointing_steps: 100
validation_steps: 10000
trainable_modules:
- to_q
seed: 33
mixed_precision: fp16
use_8bit_adam: false
gradient_checkpointing: true
enable_xformers_memory_efficient_attention: true
what is this param for? n_sample_frames: 96
@aartykov position encoding size. Motion module trained for 24 frames and so it can't generate more than 24 frames at once. I increased to 96 and fine tuned it. Look at another issue #4
No, dataset loader will get as much frames as it needs. Main requirement that dataset videos must contain MORE frames than
video_length
just to confirm, the datasetloader is just grabbing the frames it needs correct? or do longer videos with more frames give it more data? (do we want short 16 frame videos only for 16 frame training or is there benefit to going to 64 frames, etc)
Loader grab only video_length
frames from video. If u need more frames from one video, load this video multiple times with different starting index
Okay thought so, I'm having slightly better results with your config so I will probably try so more. I've been trying to train a 512x768 video @ 16 frames, do you think I should lower it to fit into 512 for better results? (like 256x512 instead of 512x768?)
Try 512x512 because it is size of unet trained on. Also, train a motions. For example video of Jordan with ball, trained it as "man is dribbling". If somebody walking, train it as "walking". If you need to train some rare motion then use rare token (sks as example)
@tumurzakov Thanks for sharing again :-) I noticed you are training with two video's (1.mp4 and 1.mp4). And are you training with two video's with the same prompt on purpose (I guess to get variety?)
I have a tiny problem. My dataset consists of small video clips. Each clip has minimum 16 frames. After resizing the frames I convert them to 4fps clips with stride of 4. However, when I play the mp4 files, the video passes so fast that I even cant regocnize the frames. Do you have any idea? @tumurzakov
https://github.com/tumurzakov/AnimateDiff/assets/18645902/b8c4b33e-8550-44c7-a9ec-8fc769db4c62
@Don-Chad I need cyberpunk video of flying over mist with skyscrapers. Something like opening scene of Blade Runner. For my own purposes.
maybe its just not good with character motion? I've tried different labels and such and I've added more videos but it doesn't really seem to have an impact. the couple of times I did create samples though they seem good in the samples, but once its the .pth I just don't see any of that.
@Cubey42 try to train whole module, not to_q layers only. Take a look here minor change needed.
I trained whole module on skss token for 1000 steps and it just reconstruct video sample that i used
@tumurzakov
Sure happy to help. I have a first secelction in a drive folder. Can you send me an email? Then I can share it -> mark at dopamine.amsterdam
@Cubey42 try to train whole module, not to_q layers only. Take a look here minor change needed.
I trained whole module on skss token for 1000 steps and it just reconstruct video sample that i used
so is the change all I need to do, or should I remove to_q from the config?
@Cubey42 change line
if "motion_modules" in name and name.endswith(tuple(trainable_modules)):
to
if "motion_modules" in name:
It will train whole module.
okay I'll try this, thanks!
after some more testing with this, I'm noticing an improvement in composition but now it has me thinking.... it seems like either the framerate of the data is too fast, or too slow... like there seems to be some minor motion present but if I want it to be faster should I increasing the sample_frame_rate ?
Hey! I wanna my model to learn the cartoon-style motion. So I prepared small video clips from cartoon videos. Do you suggest to train the 'whole motion module' with all of the dataset or just with a few clips? @tumurzakov
could you also share your training loss graph?
@tumurzakov I've had more success training the whole module, thank you. I have a large dataset that is already configured for a dataset that was built for Text-To-Video-Finetuning, but if possible can I use a different dataset loader (VideoJsonDataset)?
Hi,could you share your generated examples after training the whole module, many thanks.发自我的 iPhone在 2023年8月8日,00:53,Cubey42 @.***> 写道: @tumurzakov I've had more success training the whole module, thank you. I have a large dataset that is already configured for a dataset that was built for Text-To-Video-Finetuning, but if possible can I use a different dataset loader (VideoJsonDataset)?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>
What training method did u use, lora or dreambooth here? @tumurzakov
@patrolli https://arxiv.org/pdf/2307.04725.pdf
During experiments,
we discovered that using a diffusion schedule slightly dif-
ferent from the original schedule where the base T2I model
was trained helps achieve better visual quality and avoid
artifacts such as low saturability and flickering. We hy-
pothesize that slightly modifying the original schedule can
help the model better adapt to new tasks (animation) and
new data distribution. Thus, we used a linear beta sched-
ule, where βstart = 0.00085 and βend = 0.012, which is slightly
different from that used to train the original SD.
Change file /content/animatediff/models/StableDiffusion/scheduler/scheduler_config.json to
{
"_class_name": "PNDMScheduler",
"_diffusers_version": "0.6.0",
"beta_end": 0.012,
"beta_schedule": "linear",
"beta_start": 0.00085,
"num_train_timesteps": 1000,
"set_alpha_to_one": false,
"skip_prk_steps": true,
"steps_offset": 1,
"trained_betas": null,
"clip_sample": false
}
is this for training or just all animatediff work?
@tumurzakov I'm not very familiar with all the modules, but would happen to know which module handles the style/colors? I'd like to exclude it while doing the other modules.
My observation about the motion module training process so far is that the code perfectly overfits when you train with one video clip. I guess, it gives better results if your clip includes a specific motion and it is long enough.
The main drawback is it also learns the texture, color and other stuffs...
My observation about the motion module training process so far is that the code perfectly overfits when you train with one video clip. I guess, it gives better results if your clip includes a specific motion and it is long enough.
yeah my best success has been 2 identical videos, I also feel changing the sample_frame_rate has helped with faster motions, but I haven't quite understood the ideal setting, increasing it also speeds up training.
sample_frame_rate decreases the fps, that is why it helps with. faster motion.
sample_frame_rate decreases the fps, that is why it helps with. faster motion.
do you have a preference in your finetuning? I found low options like 1 cause no movement, while somewhere in the 12~15 seems to give me the most motion
since my video clips are very short, I can only use 1
and you get decent motion with 1? I don't understand why it feels like I get no movement at 1
try using higher fps video with this parameter set to 1
Guys do you have any achievements so far? @tumurzakov @Cubey42
@aartykov I'm finetune for style transfer. For example I need a cyberpunk driving video. I finetuned it with 1000 steps of manhattan driving videos. And it works as I need.
Looks awesome! How many videos does your dataset include? And how long is each video? @tumurzakov
@aartykov I'm cutting 16 frame videos from bigger one. I'm using 1:1 ratio for training steps. Better works if motions are same on most of videos. If I need two motions on video I separately fine tune one motion and then other. I'm using 100 step checkpoints and after training choosing one that fits better for my purpose. Often I'm using 300-500 step checkpoint from 1000 step training. Sometime 1000 steps resulting horrible overfit, other time it works well. Don't know why, may be if there more small details train tends to overfit.
got you. Btw may I add you on Linkedin?