neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs
https://neuralmagic.com/deepsparse/
Other
3.01k stars 176 forks source link

Deep Sparse vs onnx models #151

Closed yasesf93 closed 3 years ago

yasesf93 commented 3 years ago

Hello, I have trained several pruned models and saved the weights (using other pruning methods) and converted the saved models to onnx (using torch). I'm interested in comparing their inference times. The results are confusing as the trends do not stay the same when I change the batch size. Also, for some batch size and model combinations, onnx is faster than Deep Sparse which is confusing. I was wondering if there is an explanation for that, or I'm missing something.

mgoin commented 3 years ago

Hi there @yasesf93 ! The algorithms and implementations the DeepSparse Engine uses for each operation can vary greatly between different hardware, model architectures, and input sizes, including batch size. Simply put, there are too many combinations to pick the perfect setup every time. In my experience, DeepSparse scales much better with large batch size and numbers of cores compared to other frameworks. If pruned models of high enough sparsity are involved, then it should almost always see a speedup. The main case where you wouldn't see this is if too many operations in the model do not have optimized implementations in the engine. Enabling the diagnostic logs will help gain some insight to what is running in the optimized engine and we can help you here if you share the types of models you are attempting to run. Thanks!