microsoft / VPTQ

VPTQ, A Flexible and Extreme low-bit quantization algorithm
MIT License
500 stars 30 forks source link

New Quantized Model Request #78

Closed JoesSattes closed 3 weeks ago

JoesSattes commented 3 weeks ago

Thank you for your valuable contributions to the community; your work looks great! I came across this 70B model: Llama-3.1-Nemotron-70B-Instruct, and its benchmark results are impressive. Could you please provide the quantized weights for this model?

OpenSourceRonin commented 3 weeks ago

Hahaha, I'm just running the quantization of this model. I will release this model in the next two days.

JoesSattes commented 3 weeks ago

That’s fantastic news! Thank you for the update.

OpenSourceRonin commented 3 weeks ago

Image Hi @JoesSattes ,

The new VPTQ Llama 3.1 Nemotron 70B Instruct HF without finetune is now available for download at https://github.com/microsoft/VPTQ?tab=readme-ov-file#evaluation and https://huggingface.co/collections/VPTQ-community/vptq-llama-31-nemotron-70b-instruct-hf-without-finetune-671730b96f16208d0b3fe942.

You are welcome to try it out! We are continuing to work on quantizing the 4-6 bit versions. Please stay tuned!