stanfordnlp / dspy

DSPy: The framework for programming—not prompting—language models
https://dspy.ai
MIT License
19.06k stars 1.46k forks source link

max_tokens #1758

Closed Ranking666 closed 1 week ago

Ranking666 commented 2 weeks ago

Why does it only support 2048 when deploying the qwen2.5-7b-instruct model using vllm? And no matter how much I set max_tokens, it doesn't work。

litellm.ContextWindowExceededError: ContextWindowExceededError: OpenAIException - Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 2048 tokens. However, you requested 5067 tokens (4067 in the messages, 1000 in the completion). Please reduce the length of the messages or completion.", 'type': 'BadRequestError', 'param': None, 'code': 400}. [dspy.teleprompt.bootstrap] filename=bootstrap.py lineno=210

chenmoneygithub commented 2 weeks ago

@Ranking666 Thanks for reporting the issue! This is not a DSPy issue, but I think you need this: https://qwen.readthedocs.io/en/latest/deployment/vllm.html#extended-context-support