Running the consistency decoder takes several seconds and most of this time is spent in a stalled state and reducing the number of diffusion steps leads to no meaningful speed increase. The default SD1.5 decoder is ~100x faster running the code example in the readme.
I'm on Pytorch 2.0.1 on Linux kernel 6.1 with an RTX 3060
Running the consistency decoder takes several seconds and most of this time is spent in a stalled state and reducing the number of diffusion steps leads to no meaningful speed increase. The default SD1.5 decoder is ~100x faster running the code example in the readme.
I'm on Pytorch 2.0.1 on Linux kernel 6.1 with an RTX 3060