feat: Set basic inference parameters (top_p, min_p, repeat_penalty, repeat_last_n, seed)

sammcj commented 2 months ago

It would be great if you could set basic interference parameters with gollm.

Temperature by itself is a rather crude way of affecting the generation, instead you can use better algorithms to influence the quality of the output such as:

seed
top_p
min_p
repeat_penalty
- repeat_last_n
mirostat
- mirostat_eta
- mirostat_tau
tfs_z

Ollama has a useful table of the parameters available:

Parameter	Description	Value Type	Example Usage
mirostat	Enable Mirostat sampling for controlling perplexity. (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)	int	mirostat 0
mirostat_eta	Influences how quickly the algorithm responds to feedback from the generated text. A lower learning rate will result in slower adjustments, while a higher learning rate will make the algorithm more responsive. (Default: 0.1)	float	mirostat_eta 0.1
mirostat_tau	Controls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text. (Default: 5.0)	float	mirostat_tau 5.0
num_ctx	Sets the size of the context window used to generate the next token. (Default: 2048)	int	num_ctx 4096
repeat_last_n	Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)	int	repeat_last_n 64
repeat_penalty	Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1)	float	repeat_penalty 1.1
temperature	The temperature of the model. Increasing the temperature will make the model answer more creatively. (Default: 0.8)	float	temperature 0.7
seed	Sets the random number seed to use for generation. Setting this to a specific number will make the model generate the same text for the same prompt. (Default: 0)	int	seed 42
stop	Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return. Multiple stop patterns may be set by specifying multiple separate `stop` parameters in a modelfile.	string	stop "AI assistant:"
tfs_z	Tail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting. (default: 1)	float	tfs_z 1
num_predict	Maximum number of tokens to predict when generating text. (Default: 128, -1 = infinite generation, -2 = fill context)	int	num_predict 42
top_k	Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40)	int	top_k 40
top_p	Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9)	float	top_p 0.9
min_p	Alternative to the top_p, and aims to ensure a balance of quality and variety. The parameter p represents the minimum probability for a token to be considered, relative to the probability of the most likely token. For example, with p=0.05 and the most likely token having a probability of 0.9, logits with a value less than 0.045 are filtered out. (Default: 0.0)	float	min_p 0.05

teilomillet commented 1 month ago

Thank you Sam for the suggestion, I've made another branch to implement those (ollama_config).

Can you check and confirm that this is working on your usage, before I merge the branch ?

I've create an examples/testing_examples.go but I am not sure it emulate how you are using it, neither if it's working.

You should be able to add any params, by using SetXXX inside the NewLLM. You can find the complete list inside the config.go.

Thanks,

teilomillet commented 1 month ago

I've merge them, it should be working.

sammcj commented 1 month ago

Oh gosh sorry I completely missed this!

Thanks so much for adding them, that's great :)

teilomillet / gollm

feat: Set basic inference parameters (top_p, min_p, repeat_penalty, repeat_last_n, seed) #8