neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs
https://neuralmagic.com/deepsparse/
Other
3.01k stars 176 forks source link

yolov5 speed is very slow #177

Closed XiaoJiNu closed 2 years ago

XiaoJiNu commented 3 years ago

Hi, I test yolov5s model in ubuntu 16.04 but the speed is very slow as below figure. It warns that VNNI instructions not detected, quantization speedup not well supported. What't the problem and how should I solve it ?

image

My cpu information as below figure shown image

markurtz commented 3 years ago

Hi @XiaoJiNu, when this error pops up it means that the current CPU being used does not have the VNNI instruction set available. VNNI is required to run on CPUs performantly with quantization and the DeepSparse engine.

We recommend trying to deploy on a VNNI capable CPU if possible such as the c5.12xlarge on AWS. If you supply more info about where you are trying to deploy, then we can give better guidance for what would be appropriate.

If you're unable to change the instance type you're running on, then unfortunately quantization won't be supported for speedup. The pruned-only networks will be, though, and will give better speedup as compared to the dense and sparse-quantized networks.

Thanks, Mark

XiaoJiNu commented 3 years ago

@markurtz Thank you, now I use Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz which supports avx 512 as below first giure

image

When I run yolov5l with following code and the inference time as the second figure shown, is the inference time right ? python annotate.py zoo:cv/detection/yolov5-l/pytorch/ultralytics/coco/pruned-aggressive_98 --source /data1/youren/data/hf/golden-pin-cut-test-data/ --image-shape 640 640

image

markurtz commented 3 years ago

Hi @XiaoJiNu, could you run the following code to confirm the support of VNNI and AVX512? Looking up that CPU quickly I wasn't able to find definite support for either listed out but did see support for AVX2.

If VNNI isn't supported, then the pruned model should give speedup as you gave the results for. We can't guarantee the exact performance, but you should see at least 2x better performance as compared to the baseline on the same CPU. Could you run a comparison of the FP32, pruned model to the FP32 baseline model?

Thanks, Mark

XiaoJiNu commented 3 years ago

@markurtz Ok, I will try it when I have free time.

jeanniefinks commented 3 years ago

Hi @XiaoJiNu Just checking in to see how you are with this? Should we close this issue for now or leave it open? Thank you, Jeannie

jeanniefinks commented 2 years ago

Hello @XiaoJiNu As there have been no further comments on this, we will close out this issue. Feel free to re-open it however if you would like to continue the conversation. Thank you! Jeannie / Neural Magic