Open rsazizov opened 1 month ago
Model exported: https://drive.google.com/file/d/1ZDlRd6c1X05lrnxRThUo8FxuapS5Kgm7/view?usp=sharing
You can see that this style of Conv is not being folded to a ConvInteger correctly - @bfineran
@mgoin we'll need to take a look at the recipe and its application - conv integer requires two quantized inputs (weight and act) to the Conv, here we see a quantize (weight) input and the output being quantized (although this may be the input quantization to another layer)
@bfineran Thank you for great work :)
Wanted to let you know that I exactly am having the same performance degradation as @rsazizov on yolov8n from Throughput (items/sec): 110.0278 (on sparsezoo-yolov8n) to Throughput (items/sec): 15.5770 after converting the sparsezoo-yolov8n .pt
model using sparseml onnx exporter. Is there any known bug or update on the issue?
Hi @imAhmadAsghar we're aware of the issue and are looking into it internally - it doesn't seem to be a version compatibility issue, but you could potentially try rolling back your sparseml/pytorch versions. The issue seems to be that the model exports differently now at the beginning (a simple split node is not a few slices).
@bfineran Thank you for your response.
I actually did not get the last part of your response which is "The issue seems to be that the model exports differently now at the beginning (a simple split node is not a few slices)." Can you please explain what do you mean by that in detail, if possible? I am not a performance/optimization engineer and I just want to use sparseml/deepsparse to speed up the inference on CPU. However, the whole library is inconvenient and super foggy.
I have tested the following:
And here are the results: Performance test between pruned and default model: As you can see in the above plot that the prunning does nothing.
Performance test between pruned vs pruned and quantized model: I just don't get this plot. Nothing makes sense at all. The quantization does not work and it is getting super slow by a high margin.
Right now, I am super confused and it does not make any sense to use your library at all. I think I am lacking a lot of information regarding the whole process. Can you please provide me with the proper reference where to start because the one that is provided on the homepage is not leading me anywhere as you can see from the results.
I would really love to get it run and achieve the results you promised.
@imAhmadAsghar Hi, could you find a fix to this? What is going wrong with the exports?
@yoloyash Hi, no I could not unfortunately.
Describe the bug
When exporting the YOLOv8s (pruned50-quant, model.pt from sparsezoo) model via the ONNX exporter (sparseml.ultralytics.export_onnx), its performance noticeably decreases compared to the ONNX model available in SparseZoo
Expected behavior
Perfomance of the two ONNX files should be the same, as it is the same model.
Environment Include all relevant environment information:
To Reproduce Exact steps to reproduce the behavior:
Download model.onnx for yolov8s-pruned50-quant from SparseZoo (https://sparsezoo.neuralmagic.com/models/yolov8-s-coco-pruned50_quantized). Benchmark it using deepsparse.benchmark:
Notice fraction_of_supported_ops: 1.0 and Throughput (items/sec): 87.1154.
Now download model.pt from the same page and export it to ONNX using the provided tool:
Conversion is successful. Now benchmark exported onnx model:
Notice fraction_of_supported_ops: 0.0 and Throughput (items/sec): 20.2886.
Throughput decreased from ~88 down to ~20 for the same model.