satabios / sconce

Model Compression/Inference Made Easy
https://sconce.readthedocs.io/en/latest/
MIT License
38 stars 2 forks source link

Pruning seems to be an invasive technique? How does the package handle the performance degradation? #5

Open Magzz1 opened 7 months ago

satabios commented 7 months ago

Yes, Pruning is an invasive process. However, if you can find the sweet spot (i.e...) the tradeoff between model degradation and removing redundant data.

We can cram this space, to do so, the package employs a layer-wise sensitivity scan that parses through every layer of the model and finds the sweet spot. Usually, this is quite expensive but the package has one of the fastest ways to find the best pruning ratio.

The tutorial explains this in detail: https://sconce.readthedocs.io/en/latest/tutorials/Pruning.html#lets-first-evaluate-the-accuracy-and-model-size-of-dense-model Look out for the header “Sensitivity Scan”.

Thus even after pruning, we make a wise decision to only prune the redundant data possible. Also, fine-tuning is applied post to the pruning such that we can regain the degraded accuracy.

I hope this answers this question. Feel free to open it again if you do not feel satisfied with the answer.

satabios commented 7 months ago

To add to the above point the final result table gives a glimpse of the technique quantitatively.

Also note that there is actual reduction of MAC operations unlike quantization.

image