Open rathnaum opened 3 months ago
Here, you have aprox instruction for having it working.
https://github.com/vllm-project/vllm/issues/6576
You can use the master branch at least it give me less problems than fp8-gemm.
But with a mi300x teh performance of FP8 is more than 3 times slower than half.
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
Your current environment
I am trying out FP8 support on AMD GPUs (MI250, MI300) and the vLLM library does not seem to support AMD GPUs yet for FP8 quantization. Is there any timeline for when this will be available?
🐛 Describe the bug
Error: "fp8 quantization is currently not supported in ROCm" while running vLLM with quantization=fp8 on AMD GPUs. I am using MI250 AMD GPUs to run the vLLM inference service.