huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
26.4k stars 5.43k forks source link

AttributeError: 'NoneType' object has no attribute 'tokenize' for InstructP2P-SDXL #8463

Closed liming-ai closed 5 months ago

liming-ai commented 5 months ago

Describe the bug

Loading pipeline components...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00,  7.77it/s]
Traceback (most recent call last):
  File "/mnt/bn/sijie-us-nas-8/liming/code/diffusers/test.py", line 12, in <module>
    edited_image = pipe(
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_instruct_pix2pix.py", line 800, in __call__
    ) = self.encode_prompt(
  File "/usr/local/lib/python3.9/dist-packages/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_instruct_pix2pix.py", line 311, in encode_prompt
    prompt = self.maybe_convert_prompt(prompt, tokenizer)
  File "/usr/local/lib/python3.9/dist-packages/diffusers/loaders/textual_inversion.py", line 138, in maybe_convert_prompt
    prompts = [self._maybe_convert_prompt(p, tokenizer) for p in prompts]
  File "/usr/local/lib/python3.9/dist-packages/diffusers/loaders/textual_inversion.py", line 138, in <listcomp>
    prompts = [self._maybe_convert_prompt(p, tokenizer) for p in prompts]
  File "/usr/local/lib/python3.9/dist-packages/diffusers/loaders/textual_inversion.py", line 161, in _maybe_convert_prompt
    tokens = tokenizer.tokenize(prompt)
AttributeError: 'NoneType' object has no attribute 'tokenize'

Reproduction

This is an official example from HQ-Edit

import torch
from diffusers import StableDiffusionXLInstructPix2PixPipeline
from diffusers.utils import load_image
resolution = 768
image = load_image(
    "https://hf.co/datasets/diffusers/diffusers-images-docs/resolve/main/mountain.png"
).resize((resolution, resolution))
edit_instruction = "Turn sky into a cloudy one"
pipe = StableDiffusionXLInstructPix2PixPipeline.from_pretrained(
    "UCSC-VLAA/HQ-Edit", torch_dtype=torch.float16
).to("cuda")
edited_image = pipe(
    prompt=edit_instruction,
    image=image,
    height=resolution,
    width=resolution,
    guidance_scale=3.0,
    image_guidance_scale=1.5,
    num_inference_steps=30,
).images[0]
edited_image.save("edited_image.png")

Logs

No response

System Info

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

Who can help?

No response

sayakpaul commented 5 months ago

Would you mind opening an issue on their model repository? 'Cause this code seems to be the official one.

yiyixuxu commented 5 months ago

@sayakpaul

do you know why we make text_encoder/tokenizer optional and text_encoder2/tokenizer2 required? I think it's more natural to have text_encoder2/tokenizer2 optional, no?

does it make sense for them to be interchangeable? i.e.

if self.tokenizer is not None and self.tokenizer_2 is not None:
    tokenizers = [self.tokenizer, self.tokenizer_2]
elif self.tokenizer is not None:
    tokenizers = [self.tokenizer]
else:
    tokenizers = [self.tokenizer2]

https://github.com/huggingface/diffusers/blob/1d9a6a81b9a00b2dfab992ba75eb55bf8afb4eae/src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py#L363

sayakpaul commented 5 months ago

Yes. I will take a look.

sayakpaul commented 5 months ago

@yiyixuxu I took a look at the changes you suggested warrant a library-wide rewrite of the SDXL encode_prompt() method in that case, I think.

I think one of the reasons why text_encoder_2 / tokenizer_2 wasn't optional is because of the refiner component. It doesn't have tokenizer / text_encoder but has text_encoder_2 / tokenizer_2.

sayakpaul commented 5 months ago

Okay it should have been StableDiffusionInstructPix2PixPipeline and not the SDXL variant.

ALR-alr commented 5 months ago

I have the same question, did it resolved?