Open courage17340 opened 3 weeks ago
I found that some kernels use 32-bit integers as indices, which can easily lead to overflow. I think change them into int64_t (or other 64bit types) will be safer, and should have little impact on performance.
For example, if some tensor's numel >= 2^31,the fp8 quantization will fail. https://github.com/vllm-project/vllm/blob/edd5fe5fa29b8f9cc5fa37a30cc7211e0ff37067/csrc/quantization/fp8/common.cu#L43
No response
@mgoin
🚀 The feature, motivation and pitch
I found that some kernels use 32-bit integers as indices, which can easily lead to overflow. I think change them into int64_t (or other 64bit types) will be safer, and should have little impact on performance.
For example, if some tensor's numel >= 2^31,the fp8 quantization will fail. https://github.com/vllm-project/vllm/blob/edd5fe5fa29b8f9cc5fa37a30cc7211e0ff37067/csrc/quantization/fp8/common.cu#L43
Alternatives
No response
Additional context
No response