Closed cadedaniel closed 3 weeks ago
CI environment
See https://github.com/vllm-project/vllm/pull/5286 and https://github.com/vllm-project/vllm/issues/5152
My guess is the way we encode multiple query tokens per sequence in an attention kernel invocation breaks the flash_attn contract somehow.
actually I will close this in favor of https://github.com/vllm-project/vllm/issues/5152
Your current environment
CI environment
🐛 Describe the bug
See https://github.com/vllm-project/vllm/pull/5286 and https://github.com/vllm-project/vllm/issues/5152
My guess is the way we encode multiple query tokens per sequence in an attention kernel invocation breaks the flash_attn contract somehow.