huggingface / diffusion-fast

Faster generation with text-to-image diffusion models.
https://pytorch.org/blog/accelerating-generative-ai-3/
Apache License 2.0
193 stars 14 forks source link

Add CPU support and update README #13

Closed jiayisunx closed 6 months ago

sayakpaul commented 6 months ago

4X latency is massive. You don't need any extra setup to achieve this feat, right? If so, it would be nice to document them as thoroughly as possible.

jiayisunx commented 6 months ago

4X latency is massive. You don't need any extra setup to achieve this feat, right? If so, it would be nice to document them as thoroughly as possible.

Yes, pure, native PyTorch env. The optimization of dynamic int8 quantization has a functionality issue on CPU, which we are still working on. We may show more experiment results after all optimizations can be applied.

sayakpaul commented 6 months ago

Okay then we'd want to make that explicitly clear from the README. Otherwise, it's still incomplete in my opinion.

jiayisunx commented 6 months ago

Okay then we'd want to make that explicitly clear from the README. Otherwise, it's still incomplete in my opinion.

Can you be more specific about what you would like me to add to README?

sayakpaul commented 6 months ago

The current changes are fine except for we're not specifying what you told me here: https://github.com/huggingface/diffusion-fast/pull/13#issuecomment-2112234531

So this makes it incomplete.

jiayisunx commented 6 months ago

The current changes are fine except for we're not specifying what you told me here: #13 (comment)

So this makes it incomplete.

I have written these optimizations (BFloat16, SDPA, torch.compile, Combining q,k,v projections) can run on CPU platforms, not included the optimization of dynamic int8 quantization here.

sayakpaul commented 6 months ago

Sorry that was my oversight. Thanks!