[Usage]: ValueError: User-specified max_model_len (8192) is greater than the derived max_model_len (sliding_window=4096 or model_max_length=None in model's config.json). #6253
I want to launch vllm with Vigostral 7B Chat AWQ by enabling prefix caching. I have to disable at the same time disabling sliding windows to enable prefix caching.
Your current environment
How would you like to use vllm
I want to launch vllm with Vigostral 7B Chat AWQ by enabling prefix caching. I have to disable at the same time disabling sliding windows to enable prefix caching.
This leads to a restriction for the setting of the max model len value, which equals to the default sliding window value, according to this line of code https://github.com/vllm-project/vllm/blob/5d5b4c5fe524c3b62453bba7ad4434a27c81317a/vllm/config.py#L1392
Is it possible to increase max model len above default sliding windows value when enabling prefix caching?
Many thanks for your help