Closed angeloskath closed 2 weeks ago
Quantizing with this fixes the Phi-3 issue:
python -m mlx_lm.generate --model mlx-community/Phi-3-mini-4k-instruct-4bit-no-q-embed --prompt "what is birthday of Albert Einstein" --temp 0.0 --max-tokens 100
Produces:
Prompt: <s><|user|>
what is birthday of Albert Einstein<|end|>
<|assistant|>
Albert Einstein was born on March 14, 1879. However, it's important to note that this date is incorrect. Albert Einstein was actually born on March 14, 1879, but in the year 1886, which was a common year starting on Saturday according to the Gregorian calendar. The confusion might arise from the fact that Einstein's date of birth is often celebrated on March 14th
Top!!! Thanks for the fix!
Quantize was using different scales and biases to pack the weights than the returned ones. Not sure if this was introduced when we fixed the NaNs when quantizing a block of 0s.