Kandinsky 3.0 fails you you attempt to add embeds rather than prompts.
Since the text model for K3.0 is so heavy this is probably needed more the usual to reduce memory usage and speed, you really don't want to encode the text prompt multiple times if you can avoid it.
Traceback (most recent call last):
File "/Volumes/SSD2TB/AI/Diffusers/k3.py", line 26, in <module>
image = pipe(prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_prompt_embeds, num_inference_steps=25, generator=generator).images[0]
File "/Volumes/SSD2TB/AI/Kandinsky3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Volumes/SSD2TB/AI/Kandinsky3/lib/python3.10/site-packages/diffusers/pipelines/kandinsky3/kandinsky3_pipeline.py", line 366, in __call__
prompt_embeds, negative_prompt_embeds, attention_mask, negative_attention_mask = self.encode_prompt(
File "/Volumes/SSD2TB/AI/Kandinsky3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Volumes/SSD2TB/AI/Kandinsky3/lib/python3.10/site-packages/diffusers/pipelines/kandinsky3/kandinsky3_pipeline.py", line 153, in encode_prompt
attention_mask = attention_mask.repeat(num_images_per_prompt, 1)
UnboundLocalError: local variable 'attention_mask' referenced before assignment
Reproduction
from diffusers import AutoPipelineForText2Image
import torch
import gc
pipe = AutoPipelineForText2Image.from_pretrained("kandinsky-community/kandinsky-3", variant="fp16", torch_dtype=torch.float16)
pipe = pipe.to('mps')
prompt = "A photograph of the inside of a subway train. There are raccoons sitting on the seats. One of them is reading a newspaper. The window shows the city in the background."
prompt_embeds, negative_prompt_embeds, attention_mask, negative_attention_mask = pipe.encode_prompt(
prompt,
True,
device=pipe.device
)
#pipe.text_encoder = None
#pipe.tokenizer = None
#gc.collect()
#torch.mps.empty_cache()
#gc.collect()
#torch.mps.empty_cache()
generator = torch.Generator(device="cpu").manual_seed(42)
image = pipe(prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_prompt_embeds, num_inference_steps=25, generator=generator).images[0]
image[0].save('k3.png')
Logs
Traceback (most recent call last):
File "/Volumes/SSD2TB/AI/Diffusers/k3.py", line 26, in <module>
image = pipe(prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_prompt_embeds, num_inference_steps=25, generator=generator).images[0]
File "/Volumes/SSD2TB/AI/Kandinsky3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Volumes/SSD2TB/AI/Kandinsky3/lib/python3.10/site-packages/diffusers/pipelines/kandinsky3/kandinsky3_pipeline.py", line 366, in __call__
prompt_embeds, negative_prompt_embeds, attention_mask, negative_attention_mask = self.encode_prompt(
File "/Volumes/SSD2TB/AI/Kandinsky3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Volumes/SSD2TB/AI/Kandinsky3/lib/python3.10/site-packages/diffusers/pipelines/kandinsky3/kandinsky3_pipeline.py", line 153, in encode_prompt
attention_mask = attention_mask.repeat(num_images_per_prompt, 1)
UnboundLocalError: local variable 'attention_mask' referenced before assignment
System Info
diffusers version: 0.24.0.dev0
Platform: macOS-14.1.1-arm64-arm-64bit
Python version: 3.10.13
PyTorch version (GPU?): 2.1.1 (False)
Huggingface_hub version: 0.19.4
Transformers version: 4.35.2
Accelerate version: 0.24.1
xFormers version: not installed
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: No
Hi, tested the new merge and it solves the issue as expected.
I can now chuck the text encoder out of memory and get a big speed up even when not looping the calls to the pipe.
Describe the bug
Kandinsky 3.0 fails you you attempt to add embeds rather than prompts.
Since the text model for K3.0 is so heavy this is probably needed more the usual to reduce memory usage and speed, you really don't want to encode the text prompt multiple times if you can avoid it.
Reproduction
Logs
System Info
diffusers
version: 0.24.0.dev0Who can help?
No response