tracel-ai / cubecl

Multi-platform high-performance compute language extension for Rust.
https://burn.dev
Apache License 2.0
584 stars 24 forks source link

Support Int4/Int8.. Type #162

Open AntiAnimeGeneral opened 1 week ago

AntiAnimeGeneral commented 1 week ago

It is difficult to run LLM with f32/f16 on pc, To perform inference of LLM on the edge, it is almost necessary to use Q4 quantization. Perhaps Int4 can be used as a built-in type

nathanielsimard commented 4 days ago

We can't upload int8 or int4 to the GPU, but @laggui is working on quantization on Burn. We will probably create abstractions making it easier to create quantized kernels