Open chaihahaha opened 5 hours ago
I just downloaded NeverSleep/MiquMaid-v3-70B
and quantized it and it seems to run fine. Perhaps there is something wrong with the unquantized download?
I just downloaded
NeverSleep/MiquMaid-v3-70B
and quantized it and it seems to run fine. Perhaps there is something wrong with the unquantized download?
Thanks for testing. But NeverSleep/MiquMaid-v3-70B
is not the model I downloaded, I downloaded the mlx converted (unquantized) mlx-community/MiquMaid-v3-70B
and tried to quantize it, but unsuccessful.
I see, did you dequantize first? Even though it is not in the name, that model is actually quantized at 8 bits.
After quantizing mlx-community/miqumaid-v3-70b with this command
mlx_lm.convert --hf-path miqumaid-v3-70b --mlx-path miqumaid-v3-70b-4bit -q --qbits 4
.The model miqumaid-v3-70b-4bit cannot be inferred with mlx_lm.server.
ValueError: [dequantize] The matrix should be given as a uint32
will show up.