simonw / llm-llama-cpp

LLM plugin for running models using llama.cpp
Apache License 2.0
136 stars 19 forks source link

context truncation for long chats #30

Open tom-p-reichel opened 8 months ago

tom-p-reichel commented 8 months ago

Before this patch, llm chat crashed once a conversation became sufficiently long (>4000 tokens by default). With this patch we just cleave off the beginning of the context to allow a conversation to continue. This is kind of hacky, (after context length reached the llm can only produce n_ctx-truncate_ctx tokens per response, and of course forgets anything it can't deduce from the last truncate_ctx tokens) so any suggestions welcome.