Open SangbumChoi opened 2 years ago
I am also seeing similar results. Is there a simple explanation for this?
I think they are affected by the multithreading of torch, you can compare it by setting 'torch.set_num_threads(1)' And I got the result. FP32 CPU Inference Latency: 6.59 ms / sample INT8 CPU Inference Latency: 3.03 ms / sample
I run the code without any modification or calibration and get the results from my RTX 1070
FP32 evaluation accuracy: 0.781 INT8 evaluation accuracy: 0.779 FP32 CPU Inference Latency: 6.41 ms / sample FP32 CUDA Inference Latency: 3.01 ms / sample INT8 CPU Inference Latency: 2.67 ms / sample INT8 JIT CPU Inference Latency: 0.91 ms / sample
I think they are affected by the multithreading of torch, you can compare it by setting 'torch.set_num_threads(1)' And I got the result. FP32 CPU Inference Latency: 6.59 ms / sample INT8 CPU Inference Latency: 3.03 ms / sample
Seems weird that the latency issue is fixed with this since both inferences are performed on the CPU. Setting any number of threads should affect them equally...
Hi, thanks for the great code.
It works well without any modification.
However, the result that I ran seems curious. My GPU is Tesla V100-SXM2-32GB and CPU is Intel(R) Xeon(R) Gold 5120 CPU @ 2.20GHz. Additionally, OS is Linux.
First trial
FP32 CPU Inference Latency: 3.54 ms / sample FP32 CUDA Inference Latency: 3.92 ms / sample INT8 CPU Inference Latency: 11.76 ms / sample INT8 JIT CPU Inference Latency: 4.50 ms / sample
Second trial
FP32 CPU Inference Latency: 3.70 ms / sample FP32 CUDA Inference Latency: 3.87 ms / sample INT8 CPU Inference Latency: 9.38 ms / sample INT8 JIT CPU Inference Latency: 6.60 ms / sample
Third trial
FP32 CPU Inference Latency: 3.88 ms / sample FP32 CUDA Inference Latency: 3.92 ms / sample INT8 CPU Inference Latency: 19.98 ms / sample INT8 JIT CPU Inference Latency: 4.65 ms / sample
those are the result I got from your code. I expected that INT related model should be way more faster than FP. Do you have any explanation or idea of the above situations?