Closed JincanDeng closed 2 months ago
Hi, thanks for your work, it's a nice model.
The weights seems to be saved with errors. The diffusion_pytorch_model.safetensors which should be the float32 seems to be the float16 one and the float16 throws an error. I can open a PR to fix it if you want.
If you fix that, we can load the model like this:
text_encoder = ChatGLMModel.from_pretrained("Kwai-Kolors/Kolors", subfolder="text_encoder", torch_dtype=torch.float16)
tokenizer = ChatGLMTokenizer.from_pretrained("Kwai-Kolors/Kolors", subfolder="text_encoder")
pipe = StableDiffusionXLPipeline.from_pretrained(
"Kwai-Kolors/Kolors",
tokenizer=tokenizer,
text_encoder=text_encoder,
torch_dtype=torch.float16,
variant="fp16",
).to("cuda")
So basically for this model to work with diffusers without additional dependencies, we'll just need for transformers
to add support for ChatGLM and add support for it in the encode_prompt
cc: @yiyixuxu @sayakpaul
@asomoza We actually don’t need to integrate ChatGLM code directly into the transformers. Instead, we can simply utilize the existing infrastructure, similar to the following code snippet:
text_encoder = AutoModel.from_pretrained("THUDM/chatglm3-6b", torch_dtype=torch.float16, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
i keep getting memory crashed (for colab free under 12GB vram) even with quantized 4bit
text_encoder = AutoModel.from_pretrained("THUDM/chatglm3-6b", torch_dtype=torch.float16, trust_remote_code=True).quantize(4).cuda()
I think we should support this model to welcome more models that are inherently multi-lingual. What do we need to get it in?
Hi, thanks for your work, it's a nice model.
The weights seems to be saved with errors. The diffusion_pytorch_model.safetensors which should be the float32 seems to be the float16 one and the float16 throws an error. I can open a PR to fix it if you want.
If you fix that, we can load the model like this:
text_encoder = ChatGLMModel.from_pretrained("Kwai-Kolors/Kolors", subfolder="text_encoder", torch_dtype=torch.float16) tokenizer = ChatGLMTokenizer.from_pretrained("Kwai-Kolors/Kolors", subfolder="text_encoder") pipe = StableDiffusionXLPipeline.from_pretrained( "Kwai-Kolors/Kolors", tokenizer=tokenizer, text_encoder=text_encoder, torch_dtype=torch.float16, variant="fp16", ).to("cuda")
So basically for this model to work with diffusers without additional dependencies, we'll just need for
transformers
to add support for ChatGLM and add support for it in theencode_prompt
cc: @yiyixuxu @sayakpaul
Thank you for your suggestion. We've fixed the model's fp16 and fp32 weights on huggingface. However, the pipeline still throws an error when loading directly via from_pretrained
.
My running code:
from kolors.pipelines.pipeline_stable_diffusion_xl_chatglm_256 import StableDiffusionXLPipeline
from kolors.models.tokenization_chatglm import ChatGLMTokenizer
from kolors.models.modeling_chatglm import ChatGLMModel
text_encoder = ChatGLMModel.from_pretrained(ckpt_dir, subfolder="text_encoder", torch_dtype=torch.float16)
tokenizer = ChatGLMTokenizer.from_pretrained(ckpt_dir, subfolder="text_encoder")
pipe = StableDiffusionXLPipeline.from_pretrained(
ckpt,
tokenizer=tokenizer,
text_encoder=text_encoder,
torch_dtype=torch.float16,
variant="fp16",
).to("cuda")
The error is:
IndexError Traceback (most recent call last)
Cell In[13], line 1
----> 1 pipe = StableDiffusionXLPipeline.from_pretrained(
2 ckpt_dir,
3 tokenizer=tokenizer,
4 text_encoder=text_encoder,
5 torch_dtype=torch.float16,
6 variant="fp16",
7 ).to("cuda")
File /new_share/dengjincan/conda/envs/kolors/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py:118, in validate_hf_hub_args.<locals>._inner_fn(*args, **kwargs)
115 if check_use_auth_token:
116 kwargs = smoothly_deprecate_use_auth_token(fn_name=fn.__name__, has_token=has_token, kwargs=kwargs)
--> 118 return fn(*args, **kwargs)
File /new_share/dengjincan/conda/envs/kolors/lib/python3.8/site-packages/diffusers/pipelines/pipeline_utils.py:736, in DiffusionPipeline.from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
734 folder_path = os.path.join(cached_folder, folder)
735 is_folder = os.path.isdir(folder_path) and folder in config_dict
--> 736 variant_exists = is_folder and any(
737 p.split(".")[1].startswith(variant) for p in os.listdir(folder_path)
738 )
739 if variant_exists:
740 model_variants[folder] = variant
File /new_share/dengjincan/conda/envs/kolors/lib/python3.8/site-packages/diffusers/pipelines/pipeline_utils.py:737, in <genexpr>(.0)
734 folder_path = os.path.join(cached_folder, folder)
735 is_folder = os.path.isdir(folder_path) and folder in config_dict
736 variant_exists = is_folder and any(
--> 737 p.split(".")[1].startswith(variant) for p in os.listdir(folder_path)
738 )
739 if variant_exists:
740 model_variants[folder] = variant
IndexError: list index out of range
that error is because you have a pycache directory in the text encoder, if you delete it, it should work.
tried using standard implementation:
text_encoder = transformers.AutoModel.from_pretrained('THUDM/chatglm3-6b', torch_dtype=torch.float16, trust_remote_code=True)
tokenizer = transformers.AutoTokenizer.from_pretrained('THUDM/chatglm3-6b', trust_remote_code=True)
pipe = diffusers.StableDiffusionXLPipeline.from_pretrained('Kwai-Kolors/Kolors', tokenizer=tokenizer, text_encoder=text_encoder)
this loads text_encoder
and tokenizer
without issues, but fails initializing pipe:
Kwai-Kolors/Kolors text_encoder/kolors.py as defined in
model_index.json
does not exist in Kwai-Kolors/Kolors and is not a module in 'diffusers/pipelines'
Kolors pipeline is similar-but-different to SDXL pipeline
which means loading needs to use actual custom pipeline class:
from kolors.models.modeling_chatglm import ChatGLMModel
from kolors.models.tokenization_chatglm import ChatGLMTokenizer
from kolors.pipelines.pipeline_stable_diffusion_xl_chatglm_256 import StableDiffusionXLPipeline
but you should not redefine a well-known StableDiffusionXLPipeline
class, that will break tons of other things!
its either custom class or it works as standard StableDiffusionXLPipeline
class
and if its a custom class, this needs a full PR
Model/Pipeline/Scheduler description
Yesterday Kwai-Kolors published their new model named Kolors, which uses unet as backbone and ChatGLM3 as text encoder.
Kolors is a large-scale text-to-image generation model based on latent diffusion, developed by the Kuaishou Kolors team. Trained on billions of text-image pairs, Kolors exhibits significant advantages over both open-source and closed-source models in visual quality, complex semantic accuracy, and text rendering for both Chinese and English characters. Furthermore, Kolors supports both Chinese and English inputs, demonstrating strong performance in understanding and generating Chinese-specific content.
Open source status
Provide useful links for the implementation
Implementation: https://github.com/Kwai-Kolors/Kolors Weights: https://huggingface.co/Kwai-Kolors/Kolors