huggingface / candle

Minimalist ML framework for Rust
Apache License 2.0
14.68k stars 838 forks source link

How to do to quantize manually a phi-2 version, starting from safetensors file #1457

Closed fcn94 closed 7 months ago

fcn94 commented 7 months ago

Hi

I have fine tuned a phi-2 model using lora

I merged adapter with base model to get a trained one

I now have a bunch of safetensors file

How is it possible to convert these files into a gguf file ( llama.cpp concerter does not support phi)

In other words, how is it possible to achieve the same as : model-v2-q4k.gguf in lmz/candle-quantized-phi

LaurentMazare commented 7 months ago

You can try something like the following. The tensor-tools binary contains various tools for manipulating tensor files including quantizing them.

cargo run --example tensor-tools --release -- quantize --quantization q4k model.safetensors /tmp/model.gguf
fcn94 commented 7 months ago

That was fast. Thank for the answer. I will try

fcn94 commented 7 months ago

Seems to work OK. Thanks again