Closed yasesf93 closed 3 years ago
Hi there @yasesf93 ! The algorithms and implementations the DeepSparse Engine uses for each operation can vary greatly between different hardware, model architectures, and input sizes, including batch size. Simply put, there are too many combinations to pick the perfect setup every time. In my experience, DeepSparse scales much better with large batch size and numbers of cores compared to other frameworks. If pruned models of high enough sparsity are involved, then it should almost always see a speedup. The main case where you wouldn't see this is if too many operations in the model do not have optimized implementations in the engine. Enabling the diagnostic logs will help gain some insight to what is running in the optimized engine and we can help you here if you share the types of models you are attempting to run. Thanks!
Hello, I have trained several pruned models and saved the weights (using other pruning methods) and converted the saved models to onnx (using torch). I'm interested in comparing their inference times. The results are confusing as the trends do not stay the same when I change the batch size. Also, for some batch size and model combinations, onnx is faster than Deep Sparse which is confusing. I was wondering if there is an explanation for that, or I'm missing something.