google / gemma.cpp

lightweight, standalone C++ inference engine for Google's Gemma models.
Apache License 2.0
5.76k stars 487 forks source link

low quality responses from gemma.cpp (gemma-2-27b) when compared to AIstudio and others #278

Closed matteoserva closed 1 week ago

matteoserva commented 1 week ago

Hello. Following an exchange with u/janwas_ I'm opening this problem report with the issue and the steps to reproduce.

The issue is that gemma.cpp outputs much worse results from gemma-2-27b when compared to other implementations: gemma-2 in AI studio, chatllm.cpp

The simplest question that breaks the model in gemma.cpp.

Completa la frase: tanto va la gatta al lardo che...

Gemma2 on AI studio and chatllm (at Q8_0) both reply with the only correct answer:

ci lascia lo zampino

Instead, gemma.cpp, with weights downloaded from kaggle, replies with a series of italian words that don't even create a grammatically correct sentence:

> Completa la frase: tanto va la gatta al lardo che...
[ Reading prompt ] ........................
...**ci si lascia un dente.**

Here is the launch command used for gemma.cpp (tested also with --temperature 0.01):

./gemma --tokenizer gemma-tokenizer.spm --model 27b-it --compressed_weights ./gemma-2-27b-it-sfp.sbs

Here is another simple problem that is easily solved by gemma2 on aistudio and chatllm but can't be solved by gemma.cpp (The correct answers are 7 or 8):

> Matteo has 20 apples, he buys 20 oranges. Then he discards half of his fruits equally. Then he discards a quarter of his fruits equally between apples and oranges. How many apples remain?
[ Reading prompt ] .....................................................
Here's how to solve this problem step-by-step:
1. **Start with the total:** Matteo begins with 20 apples + 20 oranges = 40 fruits.
2. **First discard:** After discarding half, he has 40 / 2 = 20 apples left.
3. **Second discard:**  He had 40 fruits, so after the first discard, he has 40 / 2 = 20 fruits left.
**Therefore, after discarding half of his apples and a quarter of his oranges, Matteo will have 20 apples remaining.**

All tests were done against gemma 27b. The gemma.cpp commit is the following: 8ac5d66575429c4fca19fb394c8926074352c766

jan-wassenberg commented 1 week ago

Thank you, appreciate you filing the issue. Looks like it was indeed the softcap. Fix coming shortly, after which both queries work as expected.

matteoserva commented 1 week ago

I confirm that the output after https://github.com/google/gemma.cpp/pull/279 matches exactly what expected and it's aligned to all the other implementations.

jan-wassenberg commented 1 week ago

Great, thank you for confirming, and reaching out with the repro case :D