Open 0xdevalias opened 1 year ago
Thanks for sharing this!
Our conclusion still holds for now and AITemplate is still the fastest. Please let us know if you have any other suggestions! We are looking for ways to improve this.
Thanks for your detailed response :)
Colossalai's example only accelerates training, this repo focuses on inference.
Is that true? They definitely talk about inference here (though I didn't explore too deeply to see what optimisations are applied):
A bit further down on the page they reference some of the optimisations they make use of:
The implementation of the transformer encoder is from x-transformers by lucidrains.
The implementation of flash attention is from HazyResearch.
Another one I stumbled upon:
Accelerate inference of 🤗 Transformers with Intel optimization tools
I've been bouncing around various StableDiffusion optimisations the last couple of weeks, and figured I would link out to some of the ones I remember in hopes that they can be explored/added into the benchmarks/comparisons here: