huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
25.11k stars 5.18k forks source link

[regression] StableDiffusionPipeline.from_single_file() does not handle SD 1.5 models with prediction_type `v_prediction` #9171

Open lstein opened 4 weeks ago

lstein commented 4 weeks ago

Describe the bug

There are a few Stable Diffusion 1.5 models that use a prediction type of v_prediction rather than epsilon. In version 0.27.0, StableDiffusionPipeline.from_single_file() correctly detected and rendered images from such models. However, in version 0.30.0, these models are always treated as epsilon, even when the correct prediction_type and original_config arguments are set.

Reproduction

You will need to download the original config file, EasyFluffV11.yaml into the current directory for this to work. After running, the file sushi.png will show incorrect rendering.

from diffusers import StableDiffusionPipeline
import torch

model_id = 'https://huggingface.co/zatochu/EasyFluff/blob/main/EasyFluffV11.safetensors'
yaml_path = './EasyFluffV11.yaml'

pipe = StableDiffusionPipeline.from_single_file(model_id,
                                                original_config=yaml_path,
                                                prediction_type='v_prediction',
                                                torch_dtype=torch.float16,
                                                ).to("cuda")
prompt = "banana sushi"
image = pipe(prompt, num_inference_steps=25).images[0]
image.save("sushi.png")

Logs

Fetching 11 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 7330.37it/s]
Loading pipeline components...:   0%|                                                                                                                    | 0/6 [00:00<?, ?it/s]Some weights of the model checkpoint were not used when initializing CLIPTextModel: 
 ['text_model.embeddings.position_ids']
Loading pipeline components...: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 26.26it/s]
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:01<00:00, 16.72it/s]

### System Info

Who can help?

@yiyixuxu @asomoza

DN6 commented 4 weeks ago

@lstein Can you share the image outputs from v0.29.2 and v0.30.0?

DN6 commented 4 weeks ago

By any chance do you have runwayml/stable-diffusion-v1-5 saved in your HF Cache directory?

lstein commented 4 weeks ago

@lstein Can you share the image outputs from v0.29.2 and v0.30.0?

My bad. The regression is present in 0.29.2 as well. The previous working version was 0.27.0. I have amended the bug report.

Here is the output from the script run with diffusers 0.27.0 vs 0.30.0. Also note the difference in image size. 0.27.0 apparently thinks this is an sd-2 model.

0.27.0 sushi-0 27 0.30.0 sushi-0 30

lstein commented 4 weeks ago

By any chance do you have runwayml/stable-diffusion-v1-5 saved in your HF Cache directory?

Indeed yes. I've seen that from_single_file() downloads it into the cache if it isn't there already. This seems to be the way it gets the component .json config files for the base model of the checkpoint file being loaded.

DN6 commented 3 weeks ago

Hi @lstein yes, we updated single file to rely on the model cache/configs to set up the pipleines. It enables us to support single file on a larger range for models. The prediction_type argument is deprecated and will be removed eventually. Although we should show a warning here. I will open a PR for it.

I noticed that the scheduler in the repo you linked does contain a config that sets v_prediction. You can configure your pipeline in the following way to enable correct inference.

from diffusers import StableDiffusionPipeline
import torch

model_id = 'https://huggingface.co/zatochu/EasyFluff/blob/main/EasyFluffV11.safetensors'

pipe = StableDiffusionPipeline.from_single_file(
    model_id,
    config="zatochu/EasyFluff",
    torch_dtype=torch.float16,
).to("cuda")

prompt = "banana sushi"
image = pipe(prompt, num_inference_steps=25).images[0]
image.save("sushi.png")
lstein commented 3 weeks ago

I noticed that the scheduler in the repo you linked does contain a config that sets v_prediction. You can configure your pipeline in the following way to enable correct inference.

I'm a developer of InvokeAI, and am trying to support users who import arbitrary .safetensors models, so it will be difficult to find a general mechanism to identify the diffusers model with a config that matches what the safetensors file needs. Can you suggest how to do this?

DN6 commented 3 weeks ago

In most cases we can auto match to the appropriate config, provided that the .safetensors file is in the original format and not the diffusers format. If you check the keys of the single file checkpoint and the diffusers checkpoints you will notice that the keys are different.

In this particular case you're setting the prediction_type argument anyway since the YAML configs do not contain that information either.

You could configure a scheduler before hand with prediction type and set it in the pipeline.

e.g

from diffusers import StableDiffusionPipeline, DDIMScheduler

ckpt_path = "https://huggingface.co/zatochu/EasyFluff/blob/main/EasyFluffV11.safetensors"
pipe = StableDiffusionPipeline.from_single_file(ckpt_path)
pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config, prediction_type="v_prediction")
print(pipe.scheduler.config.prediction_type)

from_single_file operates on the assumption that you are trying to load a checkpoint saved in the original format. We could update/add a util function in diffusers.loader.single_file_utils that raises an error if we can't match to an appropriate config . The current behaviour is to default to SD 1.5, which can be confusing.

Do you happen to have a list of models that would need to support these arbitrary .safetensors files? Just so I understand your requirements a bit better?

yiyixuxu commented 3 weeks ago

the yaml file does specify the v_prediction though https://huggingface.co/zatochu/EasyFluff/blob/main/EasyFluffV11.yaml#L5

should we consider adding a special check for this config when a yaml is passed? I think this is really an edge case where a fine-tuned checkpoint can have a different configuration from the base checkpoint

DN6 commented 3 weeks ago

Ah my bad. Missed that. But even in earlier versions, we relied on the prediction_type argument to configure the scheduler. It wasn't set from the YAML.

https://github.com/huggingface/diffusers/blob/b69fd990ad8026f21893499ab396d969b62bb8cc/src/diffusers/loaders/single_file_utils.py#L1546

In the current version, setting via prediction_type only works if

  1. local_file_only=True
  2. A cached config for the model isn't present locally.

The reasoning was to encourage setting the prediction type via the Scheduler object and passing that object to the pipeline. Like we do for from_pretrained. I think I missed this potential path during the refactor, so it is a breaking change. We can add additional checks for legacy kwargs and update the loading, but these kwargs are slated to be removed and this is a bit of an edge case. I would recommend following the same configuration process as from_pretrained when doing single file loading and configuring the scheduler object before hand or using the config argument.

yiyixuxu commented 3 weeks ago

@lstein can you let us know if the solution @DN6 proposed here works for you? https://github.com/huggingface/diffusers/issues/9171#issuecomment-2295704043

DN6 commented 3 weeks ago

PR to address the current issue: https://github.com/huggingface/diffusers/pull/9229