mszhanyi commented 1 year ago

Ask a Question

Since the GPU machines of CI have been upgraded from NV6 to T4, it looks quantized model on GPU should be added too. Hardware support is required to achieve better performance with quantization on GPUs. You need a device that supports Tensor Core int8 computation, like T4 or A100. https://onnxruntime.ai/docs/performance/quantization.html#quantization-on-gpu

But it looks that the test result on CPU with VNNI if different from on GPU. Is it expected? @yufenglee If it's expected, shall we add test data on GPU? @jcwchen @snnn

mszhanyi commented 1 year ago

And to my surprise, quantized model tests on GPU(T4) have the same result same the old test data without VNNI. So, the tests passed with the incorrect test data.

jcwchen commented 1 year ago

Just came back from my vacation -- Thanks for brining this up. For now, at least in ONNX Model Zoo repo, I slightly tend to only keep single valid test_data_set created by CPU EP for simplicity. It can also reduce burden for the contributors.

And to my surprise, quantized model tests on GPU(T4) have the same result same the old test data without VNNI. So, the tests passed with the incorrect test data.

I would like to understand more about the result difference for quantized models among:

CPU without VNNI
CPU with VNNI
GPU without T4
GPU with T4

As you mentioned, it is surprised that 1=4!=2. Perhaps we can make further decision if we have confirmed this result is expected.

onnx / models

Quantized model test data on GPU #581

Ask a Question