Open wanghaoshuang opened 2 years ago
Hi @wanghaoshuang
It is going to be released soon. Please stay tuned.
Thanks, Reza
@RezaYazdaniAminabadi Thanks. How do you implement the INT8 gemm? based on API of cublas? Or rewrite a cuda kernel to support for reading int8 weights ?
Hey @RezaYazdaniAminabadi, are the performance gains in the blog based on these unreleased INT8 kernels?
https://www.microsoft.com/en-us/research/blog/deepspeed-accelerating-large-scale-model-inference-and-training-via-system-optimizations-and-compression/
Is the code of "High-performance INT8 inference kernels" mentioned above released in this repo?