ml-explore / mlx

MLX: An array framework for Apple silicon
https://ml-explore.github.io/mlx/
MIT License
14.83k stars 845 forks source link

Bug fix in quantize #1054

Closed angeloskath closed 2 weeks ago

angeloskath commented 2 weeks ago

Quantize was using different scales and biases to pack the weights than the returned ones. Not sure if this was introduced when we fixed the NaNs when quantizing a block of 0s.

awni commented 2 weeks ago

Quantizing with this fixes the Phi-3 issue:

python -m mlx_lm.generate --model mlx-community/Phi-3-mini-4k-instruct-4bit-no-q-embed --prompt "what is birthday of Albert Einstein" --temp 0.0 --max-tokens 100

Produces:

Prompt: <s><|user|>
what is birthday of Albert Einstein<|end|>
<|assistant|>

Albert Einstein was born on March 14, 1879. However, it's important to note that this date is incorrect. Albert Einstein was actually born on March 14, 1879, but in the year 1886, which was a common year starting on Saturday according to the Gregorian calendar. The confusion might arise from the fact that Einstein's date of birth is often celebrated on March 14th
ivanfioravanti commented 2 weeks ago

Top!!! Thanks for the fix!