Open garyyang85 opened 3 months ago
deepseek v2 lite has context 32k upd: Sorry, I made a mistake.
The error is saying that the amount of memory available will not be able to handle such large context.
@simon-mo Thanks for your reply. So my understanding is: If remove the params "--max-seq-len 63040 --max-model-len 30720" and the memory is enough, It will reach the most context length that the model support. For the params --max-seq-len and --max-model-len, it is a balance that we can configure to make sure the model can work with vllm with decreased context length, when the memory is not enough. Add the number is evaluated by vllm. Right?
Your current environment
I tried deepseek-coder-v2-lite-instruct can be started on 2 x L40 GPU with vllm 0.5.1,but the context cannot reach 128K, only 9415 tokens in my test. Below is my start cmd.
When I remove the --max-seq-len 63040 --max-model-len 30720, it will report error when start: