This PR attempts to truncate the message history to keep the prompt within the 4096 token context window of LLaMA 2. Assuming that the first message between the user and assistant sets a topic for the conversation, the truncation logic removes from the middle, following that initial exchange.
This PR attempts to truncate the message history to keep the prompt within the 4096 token context window of LLaMA 2. Assuming that the first message between the user and assistant sets a topic for the conversation, the truncation logic removes from the middle, following that initial exchange.