Closed gitfabianmeyer closed 5 months ago
I am able to reproduce the error above.
I can also generate a similar error, but it's not related to the cfg. This is the smallest prompt I could craft to force this error. Note that it's "prompt only", not parameters are passed.
python txt2image.py "A stunning vision of a Subterranean Sci-Fi village hidden within a Far Side Observatory Zone. The artstyle features a fusion of retro-futuristic elements reminiscent of Syd Mead's renowned concept art, combined with the surrealistic charm of Salvador Dali's classic paintings. The color palette consists of rich, vibrant hues inspired by the neon lights of Blade Runner, infused with the cool, atmospheric tones of 2001: A Space Odyssey."
0%| | 0/2 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/Users/ovid/projects/llms/mlx-examples/stable_diffusion/txt2image.py", line 65, in <module>
for x_t in tqdm(latents, total=args.steps):
File "/Users/ovid/miniconda3/lib/python3.11/site-packages/tqdm/std.py", line 1178, in __iter__
for obj in iterable:
File "/Users/ovid/projects/llms/mlx-examples/stable_diffusion/stable_diffusion/__init__.py", line 245, in generate_latents
conditioning, pooled_conditioning = self._get_text_conditioning(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ovid/projects/llms/mlx-examples/stable_diffusion/stable_diffusion/__init__.py", line 217, in _get_text_conditioning
conditioning_1 = self.text_encoder_1(tokens_1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ovid/projects/llms/mlx-examples/stable_diffusion/stable_diffusion/clip.py", line 94, in __call__
x = x + self.position_embedding.weight[:N]
~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ValueError: Shapes (1,90,768) and (77,768) cannot be broadcast.
I have the latest mlx-examples repo and ensured I have all of the latest requirements in stalled.
Sorry for taking so long to address this. The fix is in #667 if you want to use it before it is merged.
It seems that you cant use SD XL with the cfg param > 1:
python txt2image.py "A closeup picture of an elephant with glowing eyes" --n_images 4 --n_rows 2 --verbose --model sdxl --cfg 1.5 --output test_cfg.png
leads toEdit: Typos