Deep Sparse vs onnx models

neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs

Other

3.01k stars 176 forks source link

Hi there @yasesf93 ! The algorithms and implementations the DeepSparse Engine uses for each operation can vary greatly between different hardware, model architectures, and input sizes, including batch size. Simply put, there are too many combinations to pick the perfect setup every time. In my experience, DeepSparse scales much better with large batch size and numbers of cores compared to other frameworks. If pruned models of high enough sparsity are involved, then it should almost always see a speedup. The main case where you wouldn't see this is if too many operations in the model do not have optimized implementations in the engine. Enabling the diagnostic logs will help gain some insight to what is running in the optimized engine and we can help you here if you share the types of models you are attempting to run. Thanks!

neuralmagic / deepsparse

Deep Sparse vs onnx models #151