Description: When an OpenAI assistant is invoked, it creates a run by default, allowing users to set only a few request fields. The truncation strategy is set to auto, which includes previous messages in the thread along with the current question until the context length is reached. This causes token usage to grow incrementally:
consumed_tokens = previous_consumed_tokens + current_consumed_tokens.
This PR adds support for user-defined truncation strategies, giving better control over token consumption.
Description: When an OpenAI assistant is invoked, it creates a run by default, allowing users to set only a few request fields. The truncation strategy is set to auto, which includes previous messages in the thread along with the current question until the context length is reached. This causes token usage to grow incrementally: consumed_tokens = previous_consumed_tokens + current_consumed_tokens.
This PR adds support for user-defined truncation strategies, giving better control over token consumption.
Issue: High token consumption.