Closed servient-ashwin closed 6 months ago
I have done some sanity checks here:
https://github.com/vllm-project/vllm/pull/1395 https://github.com/vllm-project/vllm/pull/1796 https://github.com/vllm-project/vllm/pull/1858 https://github.com/vllm-project/vllm/pull/972
as I found these were the closest to what I am experiencing.
Your current environment
How would you like to use vllm
I am observing a change in the behavior of vLLM since updating the vLLm library from v ~2.0+ to the latest v 0.4.1 build.
What are the changes?
as is
in the new version that I upgraded to. For example for an A10 GPU the API now requires themax-model-len
param to be set for every model because it can't load any 7B models. API requires amax_model_len
param to be set which I assume is a part of the engine arguments and from the description is the context length for the model and is made compulsory so that solves it, but what I am unable to figure out is why doesn't it load the model with it's full length after updating?Now since I admit this jump from 0.2 to 0.4 is far from an ideal update when there are so many versions that were released in between them and I tried looking into the change logs but, couldn't discern what was going on. since I get this as an error output.
Can I get help in understanding what changed?
Also I keep seeing a burst of these messages and not sure what the model is trying to do here