New llama.cpp update breaks llamamodel.cpp compatibility

kuvaus commented 1 year ago

Issue: Need to removellama_sample_top_p_top_k function in llamamodel.cpp and call the new sampling functions in llama.cpp/llama.cppinstead.

Description:

There's a new update to llama.cpp that changes the llama_sample_top_p_top_k function. The commit can be found at the following link: https://github.com/ggerganov/llama.cpp/commit/dd7eff57d8491792010b1002b8de6a4b54912e5c

In the llama.cpp/llama.cpp code, I noticed that there are separate functions for llama_sample_top_k and llama_sample_top_p. I believe the relevant part of the code that used to call llama_sample_top_p_top_k is inside llama.cpp/examples/main.cpp.

To fix compatibility, it might be necessary to implement parts of this in llamamodel.cpp. However, this requires some work, so it might be better to stay with the current version for now if there are more urgent issues to address.

The relevant changed lines of code in llama.cpp/examples/main.cpp are 412 - 469.

manyoso commented 1 year ago

thanks for the heads up. lemme know if you wanna take a stab at updating... looks complicated

kuvaus commented 1 year ago

I looked at it just enough to understand that I have no idea how it works. :)

I can try to implement one of the thermostats but if I don't get anywhere during the weekend I leave it to smarter people. The problem is that the sampling changes all the results from the models... so its quite important to not make too many mistakes.

kuvaus commented 1 year ago

Since we're already processing the prompt in batches and the new version creates a llama_token_data_array using llama_get_logits, it might be that that implementation is doing redundant work and could be simplified.

It would also be worth to check if that sampling produces similar results as previous but the problem is that there's always randomness in these models.

nomic-ai / gpt4all-chat

New llama.cpp update breaks llamamodel.cpp compatibility #200