ml-explore / mlx-examples

Examples in the MLX framework
MIT License
5.82k stars 827 forks source link

SDXL Turbo + CFG Param #629

Closed gitfabianmeyer closed 5 months ago

gitfabianmeyer commented 5 months ago

It seems that you cant use SD XL with the cfg param > 1: python txt2image.py "A closeup picture of an elephant with glowing eyes" --n_images 4 --n_rows 2 --verbose --model sdxl --cfg 1.5 --output test_cfg.png leads to

temb = temb + emb
           ~~~~~^~~~~
ValueError: Shapes (8,1280) and (2,1280) cannot be broadcast.

Edit: Typos

Ovid commented 5 months ago

I am able to reproduce the error above.

I can also generate a similar error, but it's not related to the cfg. This is the smallest prompt I could craft to force this error. Note that it's "prompt only", not parameters are passed.

python txt2image.py "A stunning vision of a Subterranean Sci-Fi village hidden within a Far Side Observatory Zone. The artstyle features a fusion of retro-futuristic elements reminiscent of Syd Mead's renowned concept art, combined with the surrealistic charm of Salvador Dali's classic paintings. The color palette consists of rich, vibrant hues inspired by the neon lights of Blade Runner, infused with the cool, atmospheric tones of 2001: A Space Odyssey."
  0%|                                                                                                           | 0/2 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/Users/ovid/projects/llms/mlx-examples/stable_diffusion/txt2image.py", line 65, in <module>
    for x_t in tqdm(latents, total=args.steps):
  File "/Users/ovid/miniconda3/lib/python3.11/site-packages/tqdm/std.py", line 1178, in __iter__
    for obj in iterable:
  File "/Users/ovid/projects/llms/mlx-examples/stable_diffusion/stable_diffusion/__init__.py", line 245, in generate_latents
    conditioning, pooled_conditioning = self._get_text_conditioning(
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ovid/projects/llms/mlx-examples/stable_diffusion/stable_diffusion/__init__.py", line 217, in _get_text_conditioning
    conditioning_1 = self.text_encoder_1(tokens_1)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ovid/projects/llms/mlx-examples/stable_diffusion/stable_diffusion/clip.py", line 94, in __call__
    x = x + self.position_embedding.weight[:N]
        ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ValueError: Shapes (1,90,768) and (77,768) cannot be broadcast.

I have the latest mlx-examples repo and ensured I have all of the latest requirements in stalled.

angeloskath commented 5 months ago

Sorry for taking so long to address this. The fix is in #667 if you want to use it before it is merged.