hotshotco / Hotshot-XL

✨ Hotshot-XL: State-of-the-art AI text-to-GIF model trained to work alongside Stable Diffusion XL
https://hotshot.co
Apache License 2.0
982 stars 77 forks source link

Json for Diffusers download #10

Closed tin2tin closed 8 months ago

tin2tin commented 8 months ago

With a few edits of the json I was able to get Diffusers to download the model, but I realized it was far too big to run on my hardware locally. Have you tried with the compression techniques utilized in the Wuerstchen project? https://huggingface.co/warp-ai/wuerstchen

Here's the json working with Diffusers:

{
  "_class_name": "HotshotXLPipeline",
  "_diffusers_version": "0.21.4",
  "scheduler": [
    "diffusers",
    "EulerAncestralDiscreteScheduler"
  ],
  "text_encoder": [
    "transformers",
    "CLIPTextModel"
  ],
  "tokenizer": [
    "transformers",
    "CLIPTokenizer"
  ],
  "tokenizer_2": [
    "transformers",
    "CLIPTokenizer"
  ],
  "unet": [
    "hotshot_xl",
    "UNet3DConditionModel"
  ],
  "vae": [
    "diffusers",
    "AutoencoderKL"
  ]
}
aakashs commented 8 months ago

Wuerstchen looks very cool but haven't read too much about it past the abstract! Hotshot-XL was trained to work with SDXL because of its balance between fidelity / prompt understanding.

There are some tips to getting Hotshot-XL running locally on low power hardware in the discord: https://discord.gg/85pqA3GG - join us!

tin2tin commented 8 months ago

Thank you. I've already joined, and what I tried to generate online looks really great. However, I don't see anything about how to bring down the VRam requirements?

I mentioned Wuerstchen only BC they found a way to compress the model, which makes it way smaller, faster and more hardware friendly and maybe applying these compression techniques to your project may make it more agile?

Have you joined the Carmenduru discord server(not mine)? There is a very active TXT2VID community there.

johnmullan commented 8 months ago

We'll be adding VRAM requirements to the Readme soon! There's a new --low_vram_mode argument in the HotshotXLPipeline now which may help. All it does is move the text encoder and vae to the cpu when they are not needed. Memory hits 7.6gb now at fp16 when running the unet - but needs tested with smaller cards.

tin2tin commented 8 months ago

Thank you. I'm developing this Blender add-on on 6 GB VRAM and it is running Modelscope, Zeroscope and SDXL fine. Let me know if you at some point want me to test it on my low spec hardware. https://github.com/tin2tin/Pallaidium