Closed jl3676 closed 4 months ago
This issue occurs when we fail to import FlashInfer
Make sure you have it installed (with the proper Pytorch version (2.3) + CUDA version (likely 12.1)).
wget https://github.com/flashinfer-ai/flashinfer/releases/download/v0.0.8/flashinfer-0.0.8+cu121torch2.3-cp310-cp310-linux_x86_64.whl
pip install flashinfer-0.0.8+cu121torch2.3-cp310-cp310-linux_x86_64.whl
Your current environment
🐛 Describe the bug
When trying to load gemma-2-27b using vllm, I encountered an error after setting the attention backend variable to FLASHINFER. I've included the minimal amount of code to reproduce the error on my cluster and the error log from running this code. I'd appreciate any insights in how to resolve this issue.