usyd-fsalab / fp6_llm

An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).
Apache License 2.0
195 stars 15 forks source link

Can we get FP4? #5

Open catid opened 6 months ago

catid commented 6 months ago

FP6 doesn't seem to be a useful size. The best models are 70B that we can run, and only 4 bit models will fit in ~40-48GB VRAM

Summer-Summer commented 6 months ago

We will support FP5 soon. Yeah, I will try to also support FP4.