utilityai / llama-cpp-rs

167 stars 49 forks source link

Replicate llama.cpp default settings... #108

Closed oddpxl closed 9 months ago

oddpxl commented 9 months ago

When compiling llama.cpp "out of the box" and prompting it as follows... ( in this case on a Mac M1 )

./main -p "Write a rhyme haiku about a rabbit and a cube." -m llama-2-7b-chat.Q4_0.gguf -n 128 -ngl 33 --mlock --threads 8

We can see that llama.cpp use the following sampling settings and order...

sampling: repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000 top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800 mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000

sampling order: CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature generate: n_ctx = 512, n_batch = 512, n_predict = 128, n_keep = 1

The ability to replicate settings and sample order would be very useful when comparing results with llama.cpp

Also - several of these are key to adjusting LLM behaviour - like temperature and penalty etc

MarcusDunn commented 9 months ago

We do replicate the default settings. the Default impl for both LlamaContextParams and LlamaModelParams defer to llama.cpp.

main.cpp depends on common.cpp and sampling.cpp - both of which I consider out of scope for this project to maintain bindings too. (I wrote our own version of grammar.cpp to avoid extra bindings).

There is a plan on the llama.cpp side to move sampling.cpp behind llama.h in which case I imagine the sampling params would align a lot better. If there's specific context, model, or sampling params you want to tune I'd be happy to add them one by one.

I've created #109 to attempt to slowly move us towards allowing replicating main.cpp in rust.

oddpxl commented 9 months ago

All make sense !

..and clearly I need to read up on sampling - thanks for the pointers !