Open col-in-coding opened 7 months ago
It seems that there's no decreasing of lattency on llama-13b model using nf4 quant
I'm not sure about its compatibility with bnb. But it is already integrated into llama.cpp. You can check it if you want to run quant models.
It seems that there's no decreasing of lattency on llama-13b model using nf4 quant