Open kuvaus opened 1 year ago
thanks for the heads up. lemme know if you wanna take a stab at updating... looks complicated
I looked at it just enough to understand that I have no idea how it works. :)
I can try to implement one of the thermostats but if I don't get anywhere during the weekend I leave it to smarter people. The problem is that the sampling changes all the results from the models... so its quite important to not make too many mistakes.
Since we're already processing the prompt in batches and the new version creates a llama_token_data_array
using llama_get_logits
, it might be that that implementation is doing redundant work and could be simplified.
It would also be worth to check if that sampling produces similar results as previous but the problem is that there's always randomness in these models.
Issue: Need to remove
llama_sample_top_p_top_k
function inllamamodel.cpp
and call the new sampling functions inllama.cpp/llama.cpp
instead.Description:
There's a new update to
llama.cpp
that changes thellama_sample_top_p_top_k
function. The commit can be found at the following link: https://github.com/ggerganov/llama.cpp/commit/dd7eff57d8491792010b1002b8de6a4b54912e5cIn the
llama.cpp/llama.cpp
code, I noticed that there are separate functions forllama_sample_top_k
andllama_sample_top_p
. I believe the relevant part of the code that used to callllama_sample_top_p_top_k
is insidellama.cpp/examples/main.cpp
.To fix compatibility, it might be necessary to implement parts of this in
llamamodel.cpp
. However, this requires some work, so it might be better to stay with the current version for now if there are more urgent issues to address.The relevant changed lines of code in
llama.cpp/examples/main.cpp
are 412 - 469.