[Feature Request]: Add Hunyuan-DiT as base model

lllyasviel / Fooocus

Focus on prompting and generating

GNU General Public License v3.0

39.98k stars 5.52k forks source link

[Feature Request]: Add Hunyuan-DiT as base model #2984

Closed elmoBG8 closed 3 months ago

elmoBG8 commented 3 months ago

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What would your feature do?

Hunyuan-DiT is a novel and powerful image generation model that is small and on pair with SD3. Is there a way to add it as a facultative base model?

Proposed workflow

Go to https://github.com/Tencent/HunyuanDiT for more details

Additional information

No response

newxhy commented 3 months ago

No, this thing requires 12GB of graphics memory

mashb1t commented 3 months ago

@elmoBG8 sadly have to agree, integrating a pipeline for this model would only make it accessible for a small amount of Fooocus users and compared to the effort it would take the benefit for the majority of the users just isn't there yet Sorry.

elmoBG8 commented 3 months ago

Understandable, it's not the right moment (yet). Thank you anyway!!!

Il giorno ven 31 mag 2024 alle ore 22:52 Manuel Schmid < @.***> ha scritto:

@elmoBG8 https://github.com/elmoBG8 sadly have to agree, integrating a pipeline for this model would only make it accessible for a small amount of Fooocus users and compared to the effort it would take the benefit for the majority of the users just isn't there yet Sorry.

— Reply to this email directly, view it on GitHub https://github.com/lllyasviel/Fooocus/issues/2984#issuecomment-2142962392, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOL4BEIRSAFSIL555YCUGUDZFDPINAVCNFSM6AAAAABIFLQOXWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBSHE3DEMZZGI . You are receiving this because you were mentioned.Message ID: @.***>

xhoxye commented 3 months ago

Run HunyuanDiTPipeline from Diffusers under 6GBs of GPU VRAM.diffusers-0.29.0.dev0

By loading the T5 text encoder in 8 bits, you can run the pipeline in just under 6 GBs of GPU VRAM. Refer to this script for details.

https://new.reddit.com/r/StableDiffusion/comments/1d7vlsl/hunyuan_dit_in_diffusers_has_landed/

https://huggingface.co/docs/diffusers/main/en/api/pipelines/hunyuandit

https://gist.github.com/sayakpaul/3154605f6af05b98a41081aaba5ca43e

IPv6 commented 2 months ago

Run HunyuanDiTPipeline from Diffusers under 6GBs of GPU VRAM.diffusers-0.29.0.dev0

this is good! but there is still another problem - controlnets and inpaint models and all this stuff. someone have to retrain them into new architecture from scratch, it is not compatible with sdxl //

xhoxye commented 2 months ago

@IPv6 It's just a text-to-image model

IPv6 commented 2 months ago

@IPv6 It's just a text-to-image model

SDXL also text2image model, image2image capabilities and controlnets are added separately.

This is a prominent part of Fooocus and it is not exist for DiT pipeline, for now. They promising some controlnets in future - but they are not realeased yet, and there are still an open question question of inpainting - it require separately trained model, specifically for DiT architecture