neuralmagic / sparseml

Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
Apache License 2.0
2.01k stars 140 forks source link

Sparse/Quantization Aware Training for YOLOv10 #2328

Closed yoloyash closed 6 days ago

yoloyash commented 2 weeks ago

Is your feature request related to a problem? Please describe. Need to reduce model size of YOLOv10 while maintaining performance.

Describe the solution you'd like Sparse and Quantization Aware Training for YOLOv10. Maintain sparsity while training and quantizing to 8 bits. I just need some idea how to go about implementing. If I'm able to make it work, I'll open a PR.

Describe alternatives you've considered I have been looking at the YOLOv8 recipes by SpareML. While these have given me a lot of ideas, I'm not sure which layers to quantize and prune.

Additional context I'm also unsure what type of sparsification algorithm sparseml uses at the backend? is it the rigged lottery?

bfineran commented 2 weeks ago

Hi @yoloyash YOLOv10 is not currrently on our roadmap but starting from the YOLOv8 example should put you in a great spot. Targeting the Convolutional layers (inputs and weights only) is a good place to start and also skipping the initial conv/predictors should help with accuracy

yoloyash commented 2 weeks ago

@bfineran thank you so much for the reply! I think is enough to get me started. If you have any other suggestions, please let me know!

jeanniefinks commented 6 days ago

Hi @yoloyash Looks like you are set for now! Please re-open the thread if you want to continue the conversation. And if you enjoy these repos, don't forget to give us a star if you haven't already, thanks! We appreciate your support! Jeannie / Neural Magic

yoloyash commented 4 days ago

Hi @jeanniefinks thank you! @bfineran (This is about YOLOv8) I am trying to follow the guide of sparse-transfer learning yolov8 on the VOC dataset from the coco-pruned model. I have followed the exact same recipe here but my results are not even close to the ones mentioned. The onnx model provided is 26mb in size, while the one I get from training using the provided recipe is 33mb. Additionally, quantizing the model also increased the latency of the model by a lot (for both GPU and CPU) and for both backends (onnxruntime and deepsparse). Another doubt I had is, the input to the model given in the above Sparsezoo link expects a float32 input, while the model I got from training required uint8/int8. I think I might be missing a few steps, can you help out please? Edit: Seems like this is a bug currently. #2276 Edit2: Upon further inspection, I see that the provided model has activation_post_process/QuantizeLinear layer that converts float32 input to uint8 which is why the model requires float32. Please let me know if there is any fix!