Open mszhanyi opened 1 year ago
And to my surprise, quantized model tests on GPU(T4) have the same result same the old test data without VNNI. So, the tests passed with the incorrect test data.
Just came back from my vacation -- Thanks for brining this up. For now, at least in ONNX Model Zoo repo, I slightly tend to only keep single valid test_data_set created by CPU EP for simplicity. It can also reduce burden for the contributors.
And to my surprise, quantized model tests on GPU(T4) have the same result same the old test data without VNNI. So, the tests passed with the incorrect test data.
I would like to understand more about the result difference for quantized models among:
As you mentioned, it is surprised that 1=4!=2. Perhaps we can make further decision if we have confirmed this result is expected.
Ask a Question
Since the GPU machines of CI have been upgraded from NV6 to T4, it looks quantized model on GPU should be added too.
Hardware support is required to achieve better performance with quantization on GPUs. You need a device that supports Tensor Core int8 computation, like T4 or A100.
https://onnxruntime.ai/docs/performance/quantization.html#quantization-on-gpuBut it looks that the test result on CPU with VNNI if different from on GPU. Is it expected? @yufenglee If it's expected, shall we add test data on GPU? @jcwchen @snnn