liuliu / swift-diffusion

BSD 3-Clause "New" or "Revised" License
423 stars 33 forks source link

Bug in sdxl_txt2img example #52

Closed ghost closed 9 months ago

ghost commented 9 months ago

To reproduce: 1) Fresh clone on M1 mac 2) Change the WORKSPACE etc, acc to readme 3) Download all models from https://static.libnnc.org/ and replace the hardcoded model paths 4) bazel run examples:sdxl_txt2img --compilation_mode=opt

I get this image: txt2img

ghost commented 9 months ago

Any idea what is going on? I feel some channel ordering thing. Something is getting reshaped in a wrong way.

liuliu commented 9 months ago

The code should work with CUDA. On Mac, you need to turn off MFA (because its batched GEMM implementation expects NHWC layout).

DynamicGraph.flags = [.disableMetalFlashAttention]
ghost commented 9 months ago

txt2img This is the output i get when i do that.

liuliu commented 9 months ago

That's weird. The code worked for me on NVIDIA platforms (as you can see, I repurposed the code to run SSD-1B models). You might want to use --compilation_mode=dbg so if there are any assertions, it is triggered first.

Also double check the models are all exist. I checked the code in the repo, one of the clip encoder uses clip_vit_l14_f32.ckpt rather than f16 variant available, might need to change the path a little bit (although not sure if clip encoder messed would result this image or not)

ghost commented 9 months ago

Right that was it, thanks