How about the generated video quality when using more than 100 frames for training?

tumurzakov / AnimateDiff

AnimationDiff with train

Apache License 2.0

111 stars 28 forks source link

How about the generated video quality when using more than 100 frames for training? #19

Open junwenxiong opened 3 months ago

junwenxiong commented 3 months ago

How about the quality when using more than 100 frames for training?

tumurzakov commented 3 months ago

I concentraited on training at 48 frames and achieved quite good result. Model become much smoother then 16 or 24 existing frame models. 48 frames because it is limit for my 24gb card on 720x480 resolution. 320 frames i got on a100 and 512x288 and it is very expensive.

I'm training on 100+ videos quite often but now i'm using tiles and add frameN word to conditioning. I tried another version with 3d conditioning as HxWxF for training on hd video with tiles but it is too expensive. Much better to infer in 1280x720 and then use SR.

And it is better to use LoRA then train model directly due to catastrophic forgetting but that is obvious.

tumurzakov commented 3 months ago

I forgot. 48 i trained first on 256x144 and then 512x288 and same for 96 frame model for ~100k steps. 96 model didnt allow use any other extensions as cnet or IPadapter bcause of memory limit. Now I made on my another project latentflow model ram offload and I think i could now. But I don't need. 48 model is useful for all my needs now

tumurzakov commented 3 months ago

About quality.

My trained models can't infer something useful without extensions like cnet or special lora. But I don't need it. Mostly I use AD for video stylization. My models are much smoother then adv3 for example because it trained on 24 frames but adv3 much better thrained and have more versatile output.

I have not so big dataset for training ~5000 videos. It is hard to make such dataset because of scene cuts and lack of descriptions. Cuts are very big problem. I spent much time to clean dataset from cuts.