context truncation for long chats

Before this patch, llm chat crashed once a conversation became sufficiently long (>4000 tokens by default). With this patch we just cleave off the beginning of the context to allow a conversation to continue. This is kind of hacky, (after context length reached the llm can only produce n_ctx-truncate_ctx tokens per response, and of course forgets anything it can't deduce from the last truncate_ctx tokens) so any suggestions welcome.

simonw / llm-llama-cpp

context truncation for long chats #30