Closed grencez closed 1 year ago
Seems to be working as before.
New sampling parameter defaults in llama.cpp are:
tfs_z = 1.0f;
typical_p = 1.0f;
frequency_penalty = 0.0f;
presence_penalty = 0.0f;
mirostat = 0; // 0 disabled, 1 for v1, 2 for v2.
mirostat_tau = 5.0f;
mirostat_eta = 0.1f;
Might as well add those in and try.
Hrm... for mirostat, it looks like we need to remember a mu
value across subsequent calls. That's going to be a little tricky to maintain with undoing. Probably should make a structure that wraps & maintains the current context tokens alongside a new array of mu
values that were computed for each token (or carried over from previous tokens in the case of user input).
https://github.com/ggerganov/llama.cpp/pull/1126 introduced some new ones. Right now, we use repetition penalty. It does a decent job of avoiding repeated content for a while, but it's certainly not perfect. For example, a large window penalizes a lot of punctuation and causes run-on sentences. We can already change this by excluding tokens from the penalized list, but it's a balancing act that I'm not very good at.
First order of business is to get repeat_penalty working with the new API.