kabachuha / sd-webui-text2video

Auto1111 extension implementing text2video diffusion models (like ModelScope or VideoCrafter) using only Auto1111 webui dependencies
Other
1.28k stars 106 forks source link

Repos for Training and Finetuning (1 already available!) #48

Open kolabearafk opened 1 year ago

kolabearafk commented 1 year ago

Is there an existing issue for this?

What would your feature do ?

Is there any released training code or published paper mentioning the training methods used for this model?

Proposed workflow

N/A

Additional information

No response

ExponentialML commented 1 year ago

I can take a shot to see if this works with current available implementation floating around.

If we're just to training the CrossAttention layers (finetuning the Psuedo Conv3D layers are tricky) and limiting the size to 256x256, It may (this is a big if) be able to fit in 24GB of VRAM.

Also, I don't know if they used a DDPM scheduler for training or the Gaussian Diffusion scheduler for training as I don't know the correlating paper for this implementation. It seems to be a mix of video diffusion and Make-A-Video.

Either way, the process should be very simple if we reference the training methods we have floating around.

  1. Add noise to video latents based on timestep.
  2. Forward through 3D conditional unet with the noisy latents.
  3. Calculate the loss with the model prediction and the noisy latents.

I'm also curious since the model already has a sufficient amount of data, you may be able to fine tune it in an unconditional way (no prompts, just video data).

ExponentialML commented 1 year ago

I created a repository for Text2Video finetuning here using the recent Diffusers addition. Let me know how it goes if you give it a shot!

https://github.com/ExponentialML/Text-To-Video-Finetuning

kabachuha commented 1 year ago

Incredible! @ExponentialML, I'll post it on Reddit if you don't mind?

Upd: posted here https://www.reddit.com/r/StableDiffusion/comments/11zhy1b/wake_up_samurai_modelscope_text2video_finetuning/

kolabearafk commented 1 year ago

@ExponentialML Wow, truly amazing. Can't wait to try it. Thank you!

ExponentialML commented 1 year ago

@kabachuha Didn't realize you posted it. All good, thanks for doing it!

23Rj20 commented 5 months ago

@ExponentialML Hey can you please look at this error, for finetuning it is not able to locate the files even though they are present in that folder. Pease look at this issue I need an urgent fix for this. lorafileslocation lorafileslocation2 loadinglora errorgen errorreason

I have uploaded te necessary screenshot to understand the error. @kabachuha Can you also take a look at this please.