huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
25.42k stars 5.27k forks source link

Support for Kolors #8801

Closed JincanDeng closed 2 months ago

JincanDeng commented 3 months ago

Model/Pipeline/Scheduler description

Yesterday Kwai-Kolors published their new model named Kolors, which uses unet as backbone and ChatGLM3 as text encoder.

Kolors is a large-scale text-to-image generation model based on latent diffusion, developed by the Kuaishou Kolors team. Trained on billions of text-image pairs, Kolors exhibits significant advantages over both open-source and closed-source models in visual quality, complex semantic accuracy, and text rendering for both Chinese and English characters. Furthermore, Kolors supports both Chinese and English inputs, demonstrating strong performance in understanding and generating Chinese-specific content.

Open source status

Provide useful links for the implementation

Implementation: https://github.com/Kwai-Kolors/Kolors Weights: https://huggingface.co/Kwai-Kolors/Kolors

asomoza commented 3 months ago

Hi, thanks for your work, it's a nice model.

The weights seems to be saved with errors. The diffusion_pytorch_model.safetensors which should be the float32 seems to be the float16 one and the float16 throws an error. I can open a PR to fix it if you want.

If you fix that, we can load the model like this:

text_encoder = ChatGLMModel.from_pretrained("Kwai-Kolors/Kolors", subfolder="text_encoder", torch_dtype=torch.float16)
tokenizer = ChatGLMTokenizer.from_pretrained("Kwai-Kolors/Kolors", subfolder="text_encoder")

pipe = StableDiffusionXLPipeline.from_pretrained(
    "Kwai-Kolors/Kolors",
    tokenizer=tokenizer,
    text_encoder=text_encoder,
    torch_dtype=torch.float16,
    variant="fp16",
).to("cuda")

So basically for this model to work with diffusers without additional dependencies, we'll just need for transformers to add support for ChatGLM and add support for it in the encode_prompt

cc: @yiyixuxu @sayakpaul

CanvaChen commented 3 months ago

@asomoza We actually don’t need to integrate ChatGLM code directly into the transformers. Instead, we can simply utilize the existing infrastructure, similar to the following code snippet:

text_encoder = AutoModel.from_pretrained("THUDM/chatglm3-6b", torch_dtype=torch.float16, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
s9anus98a commented 3 months ago

i keep getting memory crashed (for colab free under 12GB vram) even with quantized 4bit

text_encoder = AutoModel.from_pretrained("THUDM/chatglm3-6b", torch_dtype=torch.float16, trust_remote_code=True).quantize(4).cuda()

sayakpaul commented 3 months ago

I think we should support this model to welcome more models that are inherently multi-lingual. What do we need to get it in?

JincanDeng commented 3 months ago

Hi, thanks for your work, it's a nice model.

The weights seems to be saved with errors. The diffusion_pytorch_model.safetensors which should be the float32 seems to be the float16 one and the float16 throws an error. I can open a PR to fix it if you want.

If you fix that, we can load the model like this:

text_encoder = ChatGLMModel.from_pretrained("Kwai-Kolors/Kolors", subfolder="text_encoder", torch_dtype=torch.float16)
tokenizer = ChatGLMTokenizer.from_pretrained("Kwai-Kolors/Kolors", subfolder="text_encoder")

pipe = StableDiffusionXLPipeline.from_pretrained(
    "Kwai-Kolors/Kolors",
    tokenizer=tokenizer,
    text_encoder=text_encoder,
    torch_dtype=torch.float16,
    variant="fp16",
).to("cuda")

So basically for this model to work with diffusers without additional dependencies, we'll just need for transformers to add support for ChatGLM and add support for it in the encode_prompt

cc: @yiyixuxu @sayakpaul

Thank you for your suggestion. We've fixed the model's fp16 and fp32 weights on huggingface. However, the pipeline still throws an error when loading directly via from_pretrained.

My running code:

from kolors.pipelines.pipeline_stable_diffusion_xl_chatglm_256 import StableDiffusionXLPipeline
from kolors.models.tokenization_chatglm import ChatGLMTokenizer
from kolors.models.modeling_chatglm import ChatGLMModel

text_encoder = ChatGLMModel.from_pretrained(ckpt_dir, subfolder="text_encoder", torch_dtype=torch.float16)
tokenizer = ChatGLMTokenizer.from_pretrained(ckpt_dir, subfolder="text_encoder")

pipe = StableDiffusionXLPipeline.from_pretrained(
        ckpt,
        tokenizer=tokenizer,
        text_encoder=text_encoder,
        torch_dtype=torch.float16,
        variant="fp16",
).to("cuda")

The error is:

IndexError                                Traceback (most recent call last)
Cell In[13], line 1
----> 1 pipe = StableDiffusionXLPipeline.from_pretrained(
      2     ckpt_dir,
      3     tokenizer=tokenizer,
      4     text_encoder=text_encoder,
      5     torch_dtype=torch.float16,
      6     variant="fp16",
      7 ).to("cuda")

File /new_share/dengjincan/conda/envs/kolors/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py:118, in validate_hf_hub_args.<locals>._inner_fn(*args, **kwargs)
    115 if check_use_auth_token:
    116     kwargs = smoothly_deprecate_use_auth_token(fn_name=fn.__name__, has_token=has_token, kwargs=kwargs)
--> 118 return fn(*args, **kwargs)

File /new_share/dengjincan/conda/envs/kolors/lib/python3.8/site-packages/diffusers/pipelines/pipeline_utils.py:736, in DiffusionPipeline.from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
    734 folder_path = os.path.join(cached_folder, folder)
    735 is_folder = os.path.isdir(folder_path) and folder in config_dict
--> 736 variant_exists = is_folder and any(
    737     p.split(".")[1].startswith(variant) for p in os.listdir(folder_path)
    738 )
    739 if variant_exists:
    740     model_variants[folder] = variant

File /new_share/dengjincan/conda/envs/kolors/lib/python3.8/site-packages/diffusers/pipelines/pipeline_utils.py:737, in <genexpr>(.0)
    734 folder_path = os.path.join(cached_folder, folder)
    735 is_folder = os.path.isdir(folder_path) and folder in config_dict
    736 variant_exists = is_folder and any(
--> 737     p.split(".")[1].startswith(variant) for p in os.listdir(folder_path)
    738 )
    739 if variant_exists:
    740     model_variants[folder] = variant

IndexError: list index out of range
asomoza commented 3 months ago

that error is because you have a pycache directory in the text encoder, if you delete it, it should work.

image

vladmandic commented 3 months ago

tried using standard implementation:

text_encoder = transformers.AutoModel.from_pretrained('THUDM/chatglm3-6b', torch_dtype=torch.float16, trust_remote_code=True)
tokenizer = transformers.AutoTokenizer.from_pretrained('THUDM/chatglm3-6b', trust_remote_code=True)
pipe = diffusers.StableDiffusionXLPipeline.from_pretrained('Kwai-Kolors/Kolors', tokenizer=tokenizer, text_encoder=text_encoder)

this loads text_encoder and tokenizer without issues, but fails initializing pipe:

Kwai-Kolors/Kolors text_encoder/kolors.py as defined in model_index.json does not exist in Kwai-Kolors/Kolors and is not a module in 'diffusers/pipelines'

Kolors pipeline is similar-but-different to SDXL pipeline
which means loading needs to use actual custom pipeline class:

from kolors.models.modeling_chatglm import ChatGLMModel
from kolors.models.tokenization_chatglm import ChatGLMTokenizer
from kolors.pipelines.pipeline_stable_diffusion_xl_chatglm_256 import StableDiffusionXLPipeline

but you should not redefine a well-known StableDiffusionXLPipeline class, that will break tons of other things!
its either custom class or it works as standard StableDiffusionXLPipeline class
and if its a custom class, this needs a full PR