Closed oddpxl closed 9 months ago
We do replicate the default settings. the Default
impl for both LlamaContextParams and LlamaModelParams defer to llama.cpp
.
main.cpp
depends on common.cpp
and sampling.cpp
- both of which I consider out of scope for this project to maintain bindings too. (I wrote our own version of grammar.cpp
to avoid extra bindings).
There is a plan on the llama.cpp side to move sampling.cpp
behind llama.h
in which case I imagine the sampling params would align a lot better. If there's specific context, model, or sampling params you want to tune I'd be happy to add them one by one.
I've created #109 to attempt to slowly move us towards allowing replicating main.cpp in rust.
All make sense !
..and clearly I need to read up on sampling - thanks for the pointers !
When compiling llama.cpp "out of the box" and prompting it as follows... ( in this case on a Mac M1 )
./main -p "Write a rhyme haiku about a rabbit and a cube." -m llama-2-7b-chat.Q4_0.gguf -n 128 -ngl 33 --mlock --threads 8
We can see that llama.cpp use the following sampling settings and order...
sampling: repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000 top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800 mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order: CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature generate: n_ctx = 512, n_batch = 512, n_predict = 128, n_keep = 1
The ability to replicate settings and sample order would be very useful when comparing results with llama.cpp
Also - several of these are key to adjusting LLM behaviour - like temperature and penalty etc