Open stephen-youn opened 2 years ago
Not that I know of
There is no branch at the moment. If INT4 proves widely useful, we'll add it. Just a bit tricky because AFAIK LLVM doesn't have a type for it.
@ptillet may be you have seen this (recent) paper -> https://openreview.net/forum?id=tcbBPnfwxS (link to cuda code included for reproduction) int4 quant of very large models with little hit on perf, and... no need to retrain (🎉)! With bit and bytes stuff, I guess those approaches will replace the old QAT, etc. for LLM at some points.
limitation however:
In terms of limitations, our method currently does not provide speedups for the actual multiplications, due to the lack of hardware support for mixed-precision operands (e.g. FP16 x INT4) on mainstream architectures. Moreover, our
current results do not include activation quantization, as they are not a significant bottleneck in our
target scenarios; however, this can be supported using orthogonal techniques (Yao et al., 2022).
Your last comment is because you are reusing types available out of the box from MLIR/LLVM?
@ptillet any updates on if INT4 support is on the roadmap?
My understanding is that H100 doesn't support int4, so it's not on the roadmap at all. Could be useful as a storage type though, but I'm hoping we can achive that via better CUDA interop to allow users to directly process 8xint4 packed into 1xint32
Thanks for the update @ptillet
can you elaborate on what you mean by CUDA interop? and how that would relate to storing in INT4 than at runtime quantize back to INT8/fp16, etc?
Hi, any updates on the issue? There are some projects showing viability of using INT4 for compute (i.e. https://github.com/efeslab/Atom), and it would be really awesome if it's supported in Triton.
Is there a plan to support INT4 recently? Thank you.
Is there a plan to support INT4 recently? Thank you.
https://gist.github.com/jlebar/3435b2c00deea53258887ce37231e5e2
Hi, is there a plan or an ongoing work to enable int4 data type in triton using a100? if there is an already branch, can I take a try? thanks