Greatly improved LLaMA sampling defaults.

By default drama_llama was using greedy sampling with no repetition penalty. This was a mistake in the implementation of Default for various settings structs. The default has now been changed to locally typical sampling with a minor repetition penalty. Quality of generation should be greatly improved.

Additionally, llama.cpp has been updated. This means any models will need to be updated since the tokenizer code has changed. The user will be warned in the terminal if that is the case.

mdegans / weave

Greatly improved LLaMA sampling defaults. #2