ml-explore / mlx-examples

Examples in the MLX framework
MIT License
6.16k stars 872 forks source link

Phi-3 q4 systematic wrong token in first date #727

Closed ivanfioravanti closed 6 months ago

ivanfioravanti commented 6 months ago

While testing phi-3 I have seen a very strange behaviour in MLX that is not present in ollama/llama.cpp. During inference the first date is systematically wrong (any temperature including 0.0 and any seed).

Here code used: python -m mlx_lm.generate --model mlx-community/Phi-3-mini-4k-instruct-4bit-no-q-embed --prompt "what is birthday of Albert Einstein" --temp 0.0 --max-tokens 100 --colorize --seed 195

on ollama: ollama run phi3 "what is birthday of Albert Einstein"

Screenshot 2024-04-25 at 18 38 28
awni commented 6 months ago

Did you happen to try anything other than the 4-bit?

awni commented 6 months ago

Q8 with 64 group size is fine:

Prompt: <s><|user|>
What is birthday of Albert Einstein<|end|>
<|assistant|>

Albert Einstein was born on March 14, 1879.

Note: Einstein celebrated his birthday on March 14th, but he was born in the year 1879. The date format used here is in the format of month/day/year.<|end|><|assistant|> Here's a more detailed answer:

Albert Einstein, the renowned physicist and Nobel laureate, was born on March 14, 1879, in the city of Ulm, in the Kingdom of Württemberg in the German Empire. He is best known for developing the theory of relativity, one of the two pillars of modern physics (alongside quantum mechanics). Einstein's work has had a profound impact on the way we understand the universe, and his famous equation, E=mc², has become synonymous with the concept of mass-energy equivalence.

Einstein's birthday is celebrated worldwide, and March 14th is often referred to as "Einstein Day." In 1999, the United Nations declared March 14th as the International Day of Physics, in honor of Einstein's
awni commented 6 months ago

Q4 with group size 32 is also fine:


==========
Prompt: <s><|user|>
What is birthday of Albert Einstein<|end|>
<|assistant|>

Albert Einstein was born on March 14, 1879. He was a renowned physicist who developed the theory of relativity, one of the two pillars of modern physics. His work on the photoelectric effect, Brownian motion, and the mass-energy equivalence formula E=mc² also contributed significantly to the field of physics. Einstein's birthday is celebrated worldwide as Albert Einstein Day.<|end|><|assistant|> Albert Einstein's birthday, March 14th, is also known as "Einstein Day" in many parts of the world. This day is celebrated to honor his contributions to the field of physics and his impact on our understanding of the universe.<|end|><|assistant|> Albert Einstein's birthday, March 14th, is also celebrated as "Einstein Day" in many parts of the world. This day is dedicated to celebrating his achievements and contributions to the field of physics, particularly his theory of relativity and the famous equation E=mc².<|end|><|assistant|> Albert Einstein was born on March 14, 1879, in the city of Ulm, in the Kingdom of Württemberg in the German Empire. He is widely known for
==========
``
awni commented 6 months ago

Chatting with @angeloskath about a possible long-term fix, both of these are on the table right now:

In the meantime you can quantize using 32 groups which should give you enough precision for 4-bit to work well in most cases and likely won't be that much slower.

ivanfioravanti commented 6 months ago

Group size 32 is perfect. I have seen it's better on Llama 3 too as suggested by @angeloskath Should this become the default one? I know is slower but result is slightly better. A good trade off maybe 🤷‍♂️

awni commented 6 months ago

Yes it's a good suggestion and very much on the table. We're trying to improve the quantizer as well. Depending on how that goes we'll decide about updating the default in the coming days.

awni commented 6 months ago

This is fixed in https://github.com/ml-explore/mlx/pull/1054#pullrequestreview-2030111476 and replaced the bad version in the Hub. It should give you reasonable results now:

==========
Prompt: <s><|user|>
what is birthday of Albert Einstein<|end|>
<|assistant|>

Albert Einstein was born on March 14, 1879. However, it's important to note that this date is incorrect. Albert Einstein was actually born on March 14, 1879, but in the year 1886, which was a common year starting on Saturday according to the Gregorian calendar. The confusion might arise from the fact that Einstein's date of birth is often celebrated on March 14th
==========