`lpw_stable_diffusion_xl` custom pipeline doesn't work with `from_single_file`

nawka12 commented 7 months ago

Describe the bug

I tested lpw_stable_diffusion_xl, it works with StableDiffusionXLPipeline.from_pretrained but doesn't work with StableDiffusionXLPipeline.from_single_file. I tried to delete the truncated prompt to see if it's really truncating the prompt or just an ignorable log. Here are the results:

The prompt open mouth, and aesthetic tags are truncated.

After removing the truncated prompt

Here's that same prompt and settings generated with from_pretrained using the same model, but the diffuser version:

open mouth, and aesthetic tags included

without the truncated prompts

Reproduction

pipe = StableDiffusionXLPipeline.from_single_file( model_path, torch_dtype=torch.float16, custom_pipeline="lpw_stable_diffusion_xl", use_safetensors=True, ) pipe.to('cuda')

Logs

indices sequence length is longer than the specified maximum sequence length for this model (92 > 77). Running this sequence through the model will result in indexing errors
The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: [', open mouth, masterpiece, best quality, very aesthetic, absurdres,']

System Info

diffusers version: 0.28.0.dev0
Platform: Windows-10-10.0.22631-SP0
Python version: 3.10.11
PyTorch version (GPU?): 2.2.2+cu118 (True)
Huggingface_hub version: 0.22.2
Transformers version: 4.39.3
Accelerate version: 0.28.0
xFormers version: not installed
Using GPU in script?: yes
Using distributed or parallel set-up in script?: no

Who can help?

@yiyixuxu @sayakpaul

anujkum25 commented 7 months ago

i believe that's only a warning.

nawka12 commented 7 months ago

@anujkum25 If it's only a warning, then from_single_file will produce similar images to the one produced by from_pretrained. But that's not the case here.

That's why I'm showing 4 different images here.

The first one where the prompts are truncated.
The second one, where I remove those truncated prompts manually. Results showed the same image as the first one, which proves my point that the prompts are truncated.
The third one, with the same settings as the first image but using from_pretrained, resulting different image from the first one, and detecting the prompt that is truncated on the first image (open mouth).
The fourth one, has the same settings as the second image but uses from_pretrained, resulting similar image to images 1 and 2.

The prompt is really truncated.

asomoza commented 7 months ago

Currently from_single_file doesn't load the community pipelines, as an alternative, you can download the pipeline and use it directly.

nawka12 commented 7 months ago

Currently from_single_file doesn't load the community pipelines, as an alternative, you can download the pipeline and use it directly.

Thank you for the information. Will try to implement the pipeline directly into my code.

xhinker commented 6 months ago

Currently from_single_file doesn't load the community pipelines, as an alternative, you can download the pipeline and use it directly.

Correct

katarzynasornat commented 5 months ago

Hi @xhinker! I have bought your book and also tried this pipeline with the following setting presented below:

prompt = """masterpiece, best quality, (Anime:1.4), adorable toddler flying in space, surrounded by twinkling stars and colorful planets, whimsical children's fairytale scene, watercolor style, imaginative and charming, wearing a cute astronaut suit, playful and joyful expression, soft pastel colors, flowing brushstrokes, dreamy and magical atmosphere, highly detailed and sharp focus, light and airy feel, by Mary Blair and Hayao Miyazaki, Artstation"""

negative_prompt = """(worst quality, low quality:1.4), low resolution, poorly drawn features, out of frame, deformed, blurry, dark and moody, harsh shadows, overly saturated, bad proportions, childish scribbles, amateur, draft, watermark, signature"""

from diffusers import DiffusionPipeline
import torch
from google.colab import userdata
access_token = userdata.get('hf_token')
model_id_or_path = "stabilityai/stable-diffusion-xl-base-1.0"
pipe = DiffusionPipeline.from_pretrained(
 model_id_or_path,
 torch_dtype = torch.float16,
 custom_pipeline = "lpw_stable_diffusion_xl",
 token=access_token
).to("cuda:0")

generator = torch.Generator(device="cuda").manual_seed(0)
images = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=50,
    generator=generator,
).images
images[0].resize((500,500))

I am getting this warning

Token indices sequence length is longer than the specified maximum sequence length for this model (79 > 77). Running this sequence through the model will result in indexing errors

which I do believe, should not be. Is there anything I am doing wrong here?

asomoza commented 5 months ago

https://github.com/huggingface/diffusers/issues/7666#issuecomment-2059094259

xhinker commented 5 months ago

Hi @xhinker! I have bought your book and also tried this pipeline with the following setting presented below:

prompt = """masterpiece, best quality, (Anime:1.4), adorable toddler flying in space, surrounded by twinkling stars and colorful planets, whimsical children's fairytale scene, watercolor style, imaginative and charming, wearing a cute astronaut suit, playful and joyful expression, soft pastel colors, flowing brushstrokes, dreamy and magical atmosphere, highly detailed and sharp focus, light and airy feel, by Mary Blair and Hayao Miyazaki, Artstation"""

negative_prompt = """(worst quality, low quality:1.4), low resolution, poorly drawn features, out of frame, deformed, blurry, dark and moody, harsh shadows, overly saturated, bad proportions, childish scribbles, amateur, draft, watermark, signature"""

from diffusers import DiffusionPipeline
import torch
from google.colab import userdata
access_token = userdata.get('hf_token')
model_id_or_path = "stabilityai/stable-diffusion-xl-base-1.0"
pipe = DiffusionPipeline.from_pretrained(
 model_id_or_path,
 torch_dtype = torch.float16,
 custom_pipeline = "lpw_stable_diffusion_xl",
 token=access_token
).to("cuda:0")

generator = torch.Generator(device="cuda").manual_seed(0)
images = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=50,
    generator=generator,
).images
images[0].resize((500,500))

I am getting this warning

Token indices sequence length is longer than the specified maximum sequence length for this model (79 > 77). Running this sequence through the model will result in indexing errors

which I do believe, should not be. Is there anything I am doing wrong here?

You can ignore the warning, it is populated from the tokenizer, and you will still have all tokens sent to text encoder, generate the full length embeddings with weights. It is my honor to have you read my book :)

asomoza commented 5 months ago

I must insist about this because people then compare the results and seems to have the wrong impression about diffusers. from_single_file actually doesn't load custom pipelines, so to prevent people from getting the wrong information I'll provide a easy test case for this.

If you use a simple prompt like this:

a cat playing with a (red:1.0) ball and then change the weight of the red ball to something like 3.0 to burn the image, you get this results with from_pretrained

1.0	3.0

So clearly the prompt weighting works. But if you do the same experiment with from_single_file you get this results:

1.0	3.0

We can see that the change in the weight of the word red doesn't do anything. Also if you know how to debug you can trace this and see which pipeline gets used.

Also we appreciate if you ask about something about a book in the corresponding repo, maybe open a new issue or to the author of the book directly and not to post about something that may confuse people in the issue. This is an issue about a problem about from_single_file.

xhinker commented 5 months ago

I must insist about this because people then compare the results and seems to have the wrong impression about diffusers. from_single_file actually doesn't load custom pipelines, so to prevent people from getting the wrong information I'll provide a easy test case for this.

If you use a simple prompt like this:

a cat playing with a (red:1.0) ball and then change the weight of the red ball to something like 3.0 to burn the image, you get this results with from_pretrained

1.0 3.0 So clearly the prompt weighting works. But if you do the same experiment with from_single_file you get this results:

1.0 3.0 We can see that the change in the weight of the word red doesn't do anything. Also if you know how to debug you can trace this and see which pipeline gets used.

Also we appreciate if you ask about something about a book in the corresponding repo, maybe open a new issue or to the author of the book directly and not to post about something that may confuse people in the issue. This is an issue about a problem about from_single_file.

Recalled that when I was building this pipeline, DiffusionPipeline did not support from_single_file to load custom pipeline then. So all code is test using from_pretrained

It is easy to convert a safetensors to diffusers format and then use the "from_pretrained" function: https://github.com/PacktPublishing/Using-Stable-Diffusion-with-Python/blob/main/chapter_6/load_stable_diffusion_models.ipynb

Here is a juptyer notebook include codes that build the unlimited weighted prompt step by step: https://github.com/PacktPublishing/Using-Stable-Diffusion-with-Python/blob/main/chapter_10/unlock_77_token_limitation_and_prompt_weight.ipynb

asomoza commented 5 months ago

Thanks for the additional information, I posted the example before just for people that finds this issue to not get the wrong idea. The example in your book works as intended just needed to insist that this doesn't work with from_single_file. Appreciate your notebooks and the great work you did with your book.

huggingface / diffusers