Open TopIdiot opened 3 months ago
fp8 not yet supported for Qwen. WIP PR: https://github.com/vllm-project/vllm/pull/6088
fp8 not yet supported for Qwen. WIP PR: #6088
@robertgshaw2-neuralmagic Hello, the error still exists in version 0.5.3 .
Fp8 is now supported for Qwen, but MoE Fp8 requires compute_capability == 9.0 (aka Hopper GPUs)
Our MoE kernels are currently implemented using Triton, which require triton==3.0 for Fp8 on Ada Lovelace. We are limited by PyTorch's version of triton
We look forward to supporting Fp8 MoE on Ada Lovelace once these dependencies are enabled
Your current environment
🐛 Describe the bug
After loading a fp8 qwen2 moe model
The config.json is