Thanks for fixing the soft-capping issue of the Gemma 2 models in the last release! I noticed there's still a comment and a warning when serving Gemma 2 models.
Are there any plans to support sliding window attention for odd layers? Additionally, do we have any benchmarks on the performance impact of not using sliding windows on these layers? Cc @WoosukKwon
🚀 The feature, motivation and pitch
Thanks for fixing the soft-capping issue of the Gemma 2 models in the last release! I noticed there's still a comment and a warning when serving Gemma 2 models.
Are there any plans to support sliding window attention for odd layers? Additionally, do we have any benchmarks on the performance impact of not using sliding windows on these layers? Cc @WoosukKwon
Alternatives
No response
Additional context
No response