stochasticai / x-stable-diffusion

Real-time inference for Stable Diffusion - 0.88s latency. Covers AITemplate, nvFuser, TensorRT, FlashAttention. Join our Discord communty: https://discord.com/invite/TgHXuSJEk6
https://stochastic.ai
Apache License 2.0
553 stars 35 forks source link

Add deepspeed, xformers, kernl, transformerengine, ColossalAI, tritonserver, VoltaML, etc #21

Open 0xdevalias opened 1 year ago

0xdevalias commented 1 year ago

I've been bouncing around various StableDiffusion optimisations the last couple of weeks, and figured I would link out to some of the ones I remember in hopes that they can be explored/added into the benchmarks/comparisons here:

glennko commented 1 year ago

Thanks for sharing this!

Our conclusion still holds for now and AITemplate is still the fastest. Please let us know if you have any other suggestions! We are looking for ways to improve this.

0xdevalias commented 1 year ago

Thanks for your detailed response :)

Colossalai's example only accelerates training, this repo focuses on inference.

Is that true? They definitely talk about inference here (though I didn't explore too deeply to see what optimisations are applied):

A bit further down on the page they reference some of the optimisations they make use of:

0xdevalias commented 1 year ago

Another one I stumbled upon: