huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
26.24k stars 5.41k forks source link

lpw_stable_diffusion_xl doesn't work #4848

Closed sergeykorablin closed 1 year ago

sergeykorablin commented 1 year ago

Describe the bug

Prompt is truncated to 77 tokens

prompt="a huge exhibition hall, similar to a hangar, it has no windows, the walls are filled with projection with scenes of winter nature and ice of Baikal, the floor of the hall is made of fabric, under which there are fans, the fabric floats above the floor in waves, also the fabric is illuminated by projection and glows as if filled with a sea of fireflies, in the center of the hall is a digital tree, which is made of white plaster, screens and neon threads, the crown of the tree is made of fabric, screens and neon threads, through the hall is a bridge that connects the entrance and exit"

Reproduction

sdxl_model_id = "./path_to_model.safetensors"

pipe_text2img = StableDiffusionXLPipeline.from_single_file(
    sdxl_model_id,
    custom_pipeline="lpw_stable_diffusion_xl",
    use_safetensors=True,
    variant="fp16",
    torch_dtype=torch.float16).to('cuda')

images = pipe_text2img(prompt=prompt, width=768, height=768, negative_prompt=negative_prompt, num_inference_steps=20).images

Logs

Token indices sequence length is longer than the specified maximum sequence length for this model (130 > 77). Running this sequence through the model will result in indexing errors
The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: ['flies, in the center of the hall is a digital tree, which is made of white plaster, screens and neon threads, the crown of the tree is made of fabric, screens and neon threads, through the hall is a bridge that connects the entrance and exit']
Token indices sequence length is longer than the specified maximum sequence length for this model (130 > 77). Running this sequence through the model will result in indexing errors
The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: ['flies, in the center of the hall is a digital tree, which is made of white plaster, screens and neon threads, the crown of the tree is made of fabric, screens and neon threads, through the hall is a bridge that connects the entrance and exit']

^^^^ repeat message twice

System Info

diffusers Version: 0.20.1 OS Fedora Linux 38, Driver Version: 535.86.10 CUDA Version: 12.2

Who can help?

@patrickvonplaten

sergeykorablin commented 1 year ago

Similar to this #3597

patrickvonplaten commented 1 year ago

Could you ping the author of the lpw_stable_diffusion_xl pipeline here?

sergeykorablin commented 1 year ago

@xhinker can you look is this a bug or I'm doing something wrong?!

xhinker commented 1 year ago

@xhinker can you look is this a bug or I'm doing something wrong?!

You can simply ignore the warning, it is sourced from the tokenizer. All your prompt is used.

kazuph commented 6 months ago

Currently from_single_file doesn't load the community pipelines, as an alternative, you can download the pipeline and use it directly. https://github.com/huggingface/diffusers/issues/7666#issuecomment-2059094259

This is not a warning, but an unimplemented feature.