Open rathnaum opened 1 month ago
Here, you have aprox instruction for having it working.
https://github.com/vllm-project/vllm/issues/6576
You can use the master branch at least it give me less problems than fp8-gemm.
But with a mi300x teh performance of FP8 is more than 3 times slower than half.
Your current environment
I am trying out FP8 support on AMD GPUs (MI250, MI300) and the vLLM library does not seem to support AMD GPUs yet for FP8 quantization. Is there any timeline for when this will be available?
🐛 Describe the bug
Error: "fp8 quantization is currently not supported in ROCm" while running vLLM with quantization=fp8 on AMD GPUs. I am using MI250 AMD GPUs to run the vLLM inference service.