rendezqueue / rendezllama

CLI for llama.cpp with various commands to guide, edit, and regenerate tokens on the fly.
ISC License
10 stars 1 forks source link

feat(option): to use different samplers #20

Closed grencez closed 1 year ago

grencez commented 1 year ago

https://github.com/ggerganov/llama.cpp/pull/1126 introduced some new ones. Right now, we use repetition penalty. It does a decent job of avoiding repeated content for a while, but it's certainly not perfect. For example, a large window penalizes a lot of punctuation and causes run-on sentences. We can already change this by excluding tokens from the penalized list, but it's a balancing act that I'm not very good at.

First order of business is to get repeat_penalty working with the new API.

grencez commented 1 year ago

Seems to be working as before.

New sampling parameter defaults in llama.cpp are:

tfs_z = 1.0f;
typical_p = 1.0f;
frequency_penalty = 0.0f;
presence_penalty = 0.0f;

mirostat = 0;  // 0 disabled, 1 for v1, 2 for v2.
mirostat_tau = 5.0f;
mirostat_eta = 0.1f;

Might as well add those in and try.

grencez commented 1 year ago

Hrm... for mirostat, it looks like we need to remember a mu value across subsequent calls. That's going to be a little tricky to maintain with undoing. Probably should make a structure that wraps & maintains the current context tokens alongside a new array of mu values that were computed for each token (or carried over from previous tokens in the case of user input).