vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
25.86k stars 3.77k forks source link

[Feature]: ROPE scaling supported by vLLM gemma2 #6175

Open kkk935208447 opened 1 month ago

kkk935208447 commented 1 month ago

🚀 The feature, motivation and pitch

Now vLLM gemma2 does not support ROPE scaling, and I sincerely hope that support for it will be added in the future.

kkk935208447 commented 1 month ago

Yes, I've encountered the same problem as you. I used the PI extension to expand the length of gemma2 to 16K. Currently, the first issue is that vLLM does not support inference beyond 4096 in length, and the second issue is that it does not support custom rope length extension.