Why does it only support 2048 when deploying the qwen2.5-7b-instruct model using vllm? And no matter how much I set max_tokens, it doesn't work。
litellm.ContextWindowExceededError: ContextWindowExceededError: OpenAIException - Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 2048 tokens. However, you requested 5067 tokens (4067 in the messages, 1000 in the completion). Please reduce the length of the messages or completion.", 'type': 'BadRequestError', 'param': None, 'code': 400}. [dspy.teleprompt.bootstrap] filename=bootstrap.py lineno=210
Why does it only support 2048 when deploying the qwen2.5-7b-instruct model using vllm? And no matter how much I set max_tokens, it doesn't work。
litellm.ContextWindowExceededError: ContextWindowExceededError: OpenAIException - Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 2048 tokens. However, you requested 5067 tokens (4067 in the messages, 1000 in the completion). Please reduce the length of the messages or completion.", 'type': 'BadRequestError', 'param': None, 'code': 400}. [dspy.teleprompt.bootstrap] filename=bootstrap.py lineno=210