[SD-XL] Running SD XL via "from_single_file" unnecessarily downloads a 10 GB CLIP model

n00mkrad commented 1 year ago

Describe the bug

Running StableDiffusionXLPipeline downloads laion/CLIP-ViT-bigG-14-laion2B-39B-b160k, which is about 10 GB in size.

Is it possible to avoid this big download?

ComfyUI, for example, seems to be able to run SDXL without this huge CLIP model.

Reproduction

Run this code example:

https://github.com/huggingface/diffusers/releases/tag/v0.18.1

The script will load the SD model, then download laion/CLIP-ViT-bigG-14-laion2B-39B-b160k into the default HF cache directory.

Logs

python sdxl.py
global_step key not found in model
Some weights of the model checkpoint at openai/clip-vit-large-patch14 were not used when initializing CLIPTextModel: [...]
- This IS expected if you are initializing CLIPTextModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing CLIPTextModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Downloading (…)olve/main/vocab.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 862k/862k [00:00<00:00, 3.10MB/s]
Downloading (…)olve/main/merges.txt: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 525k/525k [00:00<00:00, 1.94MB/s]
Downloading (…)cial_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 389/389 [00:00<?, ?B/s]
Downloading (…)okenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 904/904 [00:00<?, ?B/s]
Downloading (…)lve/main/config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.76k/4.76k [00:00<00:00, 4.77MB/s]
Downloading (…)model.bin.index.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 120k/120k [00:00<?, ?B/s]
Downloading shards:   0%|                                                                                                                                                                                                                                                                             | 0/2 [00:00<?, ?it/s]
Downloading (…)l-00001-of-00002.bin:  35%|███████████████████████████████████████████████████████████████████████████████████▌                                                                                                                                                         | 3.52G/9.99G [01:33<02:57, 36.5MB/s]
[...]

System Info

Diffusers 0.18.1 Windows 10 64-bit Python 3.10.7 Pytorch 2.0.1

Who can help?

No response

n00mkrad commented 1 year ago

For context, this is my ComfyUI setup which appears to load the CLIP model from the SDXL Safetensors files (?) instead of downloading laion/CLIP-ViT-bigG-14-laion2B-39B-b160k.

bghira commented 1 year ago

you can do the same thing using DiffusionPipeline.from_single_file('/path/to/file.ckpt', use_safetensors=True)

n00mkrad commented 1 year ago

you can do the same thing using DiffusionPipeline.from_single_file('/path/to/file.ckpt', use_safetensors=True)

NameError: name 'DiffusionPipeline' is not defined

If I import it:

AttributeError: type object 'DiffusionPipeline' has no attribute 'from_single_file'

Using from_pretrained instead won't allow me to load from safetensors:

HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'M:/Weights/SD/XL/sd_xl_base_0.9.safetensors'. Use repo_type argument if needed.

bghira commented 1 year ago

@patrickvonplaten may have only added the SingleFileMixin to the SDXL pipeline.

Try using StableDiffusionXLPipeline directly.

n00mkrad commented 1 year ago

Try using StableDiffusionXLPipeline directly.

What do you mean by "directly"?

This is my code:

from diffusers import StableDiffusionXLPipeline
import torch
import sys

pipe = StableDiffusionXLPipeline.from_single_file("M:/Weights/SD/XL/sd_xl_base_0.9.safetensors", torch_dtype=torch.float16)
pipe.to("cuda")
prompt = "astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt=prompt).images[0]
image.save(f"{sys.path[0]}/sdxl-test.png")

And this tries to download CLIP-ViT-bigG-14-laion2B-39B-b160k which I want to avoid, because it's 10GB, and ComfyUI works without this model so it must be possible.

patrickvonplaten commented 1 year ago

Hey @n00mkrad,

Could you please load the model as described here: https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/stable_diffusion_xl#texttoimage

I'm working on improving the from_single_File loading functionality

n00mkrad commented 1 year ago

Hey @n00mkrad,

Could you please load the model as described here: https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/stable_diffusion_xl#texttoimage

I'm working on improving the from_single_File loading functionality

Yep, from_pretrained works, it will not attempt to download laion/CLIP-ViT-bigG-14-laion2B-39B-b160k.

from_single_File btw also eats a ton of RAM when first loading the model (maxed out 32GB). Guess it's the conversion that's causing issues.

FurkanGozukara commented 1 year ago

Hey @n00mkrad, Could you please load the model as described here: https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/stable_diffusion_xl#texttoimage I'm working on improving the from_single_File loading functionality

Yep, from_pretrained works, it will not attempt to download laion/CLIP-ViT-bigG-14-laion2B-39B-b160k.

from_single_File btw also eats a ton of RAM when first loading the model (maxed out 32GB). Guess it's the conversion that's causing issues.

32 gb huge I am also interested in single safetensors file load

sayakpaul commented 1 year ago

So, it seems like from_single_File is where the issue is. Could we maybe edit the original post to make that a bit clearer?

patrickvonplaten commented 1 year ago

Working on improving it

patrickvonplaten commented 1 year ago

@n00mkrad,

Let me know if we can close this issue now that #4041 is merged

n00mkrad commented 1 year ago

@n00mkrad,

Let me know if we can close this issue now that #4041 is merged

Negative. Still attempts to download CLIP-ViT-bigG-14-laion2B-39B-b160k.

patrickvonplaten commented 1 year ago

It really shouldn't :sweat_smile: Can you copy-paste your diffusers versions here?

n00mkrad commented 1 year ago

It really shouldn't 😅 Can you copy-paste your diffusers versions here?

I just updated to the latest master, it works now.

However, conversion temporarily eats up about 34 GB RAM, is this expected behavior?

bghira commented 1 year ago

can't memory-map the old checkpoint style, but i'm not sure if that's the specific reason.

patrickvonplaten commented 1 year ago

It really shouldn't sweat_smile Can you copy-paste your diffusers versions here?

I just updated to the latest master, it works now.

However, conversion temporarily eats up about 34 GB RAM, is this expected behavior?

CPU VRAM no? Yeah that doesn't shock me too much since we're working in fp32 precision

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

EstherLee1995 commented 1 year ago

I have download it local. Does anyone know which folder to put it. I was working on cloud end, and the cloud end cannot connect to huggingface.

huggingface / diffusers