AttributeError: 'NoneType' object has no attribute 'tokenize' for InstructP2P-SDXL

liming-ai commented 5 months ago

Describe the bug

Loading pipeline components...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00,  7.77it/s]
Traceback (most recent call last):
  File "/mnt/bn/sijie-us-nas-8/liming/code/diffusers/test.py", line 12, in <module>
    edited_image = pipe(
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_instruct_pix2pix.py", line 800, in __call__
    ) = self.encode_prompt(
  File "/usr/local/lib/python3.9/dist-packages/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_instruct_pix2pix.py", line 311, in encode_prompt
    prompt = self.maybe_convert_prompt(prompt, tokenizer)
  File "/usr/local/lib/python3.9/dist-packages/diffusers/loaders/textual_inversion.py", line 138, in maybe_convert_prompt
    prompts = [self._maybe_convert_prompt(p, tokenizer) for p in prompts]
  File "/usr/local/lib/python3.9/dist-packages/diffusers/loaders/textual_inversion.py", line 138, in <listcomp>
    prompts = [self._maybe_convert_prompt(p, tokenizer) for p in prompts]
  File "/usr/local/lib/python3.9/dist-packages/diffusers/loaders/textual_inversion.py", line 161, in _maybe_convert_prompt
    tokens = tokenizer.tokenize(prompt)
AttributeError: 'NoneType' object has no attribute 'tokenize'

Reproduction

This is an official example from HQ-Edit

import torch
from diffusers import StableDiffusionXLInstructPix2PixPipeline
from diffusers.utils import load_image
resolution = 768
image = load_image(
    "https://hf.co/datasets/diffusers/diffusers-images-docs/resolve/main/mountain.png"
).resize((resolution, resolution))
edit_instruction = "Turn sky into a cloudy one"
pipe = StableDiffusionXLInstructPix2PixPipeline.from_pretrained(
    "UCSC-VLAA/HQ-Edit", torch_dtype=torch.float16
).to("cuda")
edited_image = pipe(
    prompt=edit_instruction,
    image=image,
    height=resolution,
    width=resolution,
    guidance_scale=3.0,
    image_guidance_scale=1.5,
    num_inference_steps=30,
).images[0]
edited_image.save("edited_image.png")

Logs

No response

System Info

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

🤗 Diffusers version: 0.28.0.dev0
Platform: Linux-5.4.143.bsk.8-amd64-x86_64-with-glibc2.31
Running on a notebook?: No
Running on Google Colab?: No
Python version: 3.9.2
PyTorch version (GPU?): 2.1.0+cu121 (True)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Huggingface_hub version: 0.23.1
Transformers version: 4.41.1
Accelerate version: 0.30.1
PEFT version: not installed
Bitsandbytes version: not installed
Safetensors version: 0.4.3
xFormers version: not installed
Accelerator: NVIDIA A100-SXM4-80GB, 81251 MiB NVIDIA A100-SXM4-80GB, 81251 MiB NVIDIA A100-SXM4-80GB, 81251 MiB NVIDIA A100-SXM4-80GB, 81251 MiB NVIDIA A100-SXM4-80GB, 81251 MiB NVIDIA A100-SXM4-80GB, 81251 MiB NVIDIA A100-SXM4-80GB, 81251 MiB NVIDIA A100-SXM4-80GB, 81251 MiB VRAM
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help?

No response

sayakpaul commented 5 months ago

Would you mind opening an issue on their model repository? 'Cause this code seems to be the official one.

yiyixuxu commented 5 months ago

@sayakpaul

do you know why we make text_encoder/tokenizer optional and text_encoder2/tokenizer2 required? I think it's more natural to have text_encoder2/tokenizer2 optional, no?

does it make sense for them to be interchangeable? i.e.

if self.tokenizer is not None and self.tokenizer_2 is not None:
    tokenizers = [self.tokenizer, self.tokenizer_2]
elif self.tokenizer is not None:
    tokenizers = [self.tokenizer]
else:
    tokenizers = [self.tokenizer2]

https://github.com/huggingface/diffusers/blob/1d9a6a81b9a00b2dfab992ba75eb55bf8afb4eae/src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py#L363

sayakpaul commented 5 months ago

Yes. I will take a look.

sayakpaul commented 5 months ago

@yiyixuxu I took a look at the changes you suggested warrant a library-wide rewrite of the SDXL encode_prompt() method in that case, I think.

I think one of the reasons why text_encoder_2 / tokenizer_2 wasn't optional is because of the refiner component. It doesn't have tokenizer / text_encoder but has text_encoder_2 / tokenizer_2.

sayakpaul commented 5 months ago

Okay it should have been StableDiffusionInstructPix2PixPipeline and not the SDXL variant.

ALR-alr commented 5 months ago

I have the same question, did it resolved?

huggingface / diffusers