Support for bnb nf4 quant?

hao-ai-lab / LookaheadDecoding

Apache License 2.0

1.04k stars 63 forks source link

Open col-in-coding opened 7 months ago

col-in-coding commented 7 months ago

It seems that there's no decreasing of lattency on llama-13b model using nf4 quant

Viol2000 commented 7 months ago

I'm not sure about its compatibility with bnb. But it is already integrated into llama.cpp. You can check it if you want to run quant models.