Open epark001 opened 4 weeks ago
Me too, anyone knows what change is needed to bring back cuda11.8 ? Maybe I can test locally first
me too
I build cuda118 version from https://github.com/vllm-project/flash-attention/tree/v2.5.8.post2 source code you can download from https://github.com/zhaotyer/vllm_whl_repo/blob/master/vllm_flash_attn-2.5.8.post2-cp38-cp38-linux_x86_64.whl
I build cuda118 version from https://github.com/vllm-project/flash-attention/tree/v2.5.8.post2 source code you can download from https://github.com/zhaotyer/vllm_whl_repo/blob/master/vllm_flash_attn-2.5.8.post2-cp38-cp38-linux_x86_64.whl
could you build a cu118 python39 version? many thanks
@heianzhihuo have you got whl with cu118 and python39, i'm looking for it, tooo
I build cuda118 version from https://github.com/vllm-project/flash-attention/tree/v2.5.8.post2 source code you can download from https://github.com/zhaotyer/vllm_whl_repo/blob/master/vllm_flash_attn-2.5.8.post2-cp38-cp38-linux_x86_64.whl
could you build a cu118 python39 version? many thanks
i'm looking for it, toooooooooo
🚀 The feature, motivation and pitch
vllm-flash-attn seems like it currently does not support cu118:
flash-attn seems to support cu118 on the original project and vllm supports cu118 so vllm-flash-attn cu118 version would be helpful
Alternatives
No response
Additional context
No response