neuralmagic / sparsify

ML model optimization product to accelerate inference.
Apache License 2.0
318 stars 28 forks source link

Quantization in UI #57

Closed IdoZach closed 2 years ago

IdoZach commented 3 years ago

Hi, Quantization is not available in the UI, could you provide an approximate ETA? is there a recommended course of action for performing pruning with UI and quantization by other means? Thanks!

markurtz commented 3 years ago

Hi @IdoZach, thank you for the question. Quantization support is currently planned to go into our beta version of Sparsify slated for roughly 2 months out right now. We are looking for active user design testing for Sparsify beta, so let us know if you would be interested in running through the current designs and giving any feedback!

In the interim, the current workarounds would be one of the following:

We are quickly adding to the functionality in SparseML, though, to make this more streamlined over the next few weeks including APIs for creating recipes. In addition, we will be releasing tutorials on how to do all three of these steps and use the APIs in the next two weeks. I'll update here once those are launched and please let us know if you have any other feedback on them!

Thanks, Mark

IdoZach commented 3 years ago

Thanks. If I have a custom recipe, prepared through Sparsify, with a final pruning epoch of 50 and overall 100 final epochs, then I would add the following:

 - !QuantizationModifier
        start_epoch: 50.0

Correct? then, after 50 pruning epochs, it would start to quantize the model for the rest? or are there more options needed?

markurtz commented 3 years ago

Correct, you'll also want to set the submodules that should be quantized in your model. Additionally, setting disable_quantization_observer_epoch and freeze_bn_stats_epoch can help recovery as well. disable_quantization_observer_epoch freezes the observer params and generally should be done after your last or second to last LR step so you can continue to train without variable quantization statistics. freeze_bn_stats_epoch is along the same vein in that it will freeze the batch norm stats to allow for better recovery when quantizing and should be set to just after what disable_quantization_observer_epoch is set to. An example implementation from our ResNet-50 recipe looks like this:

- !QuantizationModifier
    start_epoch: 100
    submodules:
      - input
      - sections
    disable_quantization_observer_epoch: 115
    freeze_bn_stats_epoch: 116

Where the total epochs are 135 and pruning stopped at epoch 60.

Note that we let it recover for roughly 40 epochs in the pruning stage before beginning quantization. We do have a few experiments to show that the pruning fine-tuning phase is unnecessary and can be replaced purely with quantization; however, not enough to say definitively. So, if there are issues with recovery during quantization, you may want to train for a bit after pruning before starting quantization.

For your example, I would recommend:

- !QuantizationModifier
    start_epoch: 50
    submodules:
      - fill in based on your model
    disable_quantization_observer_epoch: 90
    freeze_bn_stats_epoch: 91

You will also need to add the following to the pruning modifier if you are planning to run in the DeepSparse engine on Intel VNNI CPUs: mask_type: [1, 4] This will set it so that it prunes in blocks of 4 which is a requirement for speedup with sparse quantization on Intel CPUs.

Let me know if you have any further issues and happy to help more!

IdoZach commented 3 years ago

Hi, after training finishes, how can I verify that my model is indeed quantized? Similarly, when converting to onnx via ModuleExporter, should I set convert_qat? Thanks.

markurtz commented 3 years ago

Hi @IdoZach, sorry about the delay, did not see this notification come through. You can verify that the model is quantized by looking at the wrapped modules. They should have quantization wrappers around them instead of the original convs. To export, setting convert_qat is correct. A full example can be seen in our yolov5 integration: https://github.com/neuralmagic/yolov5/blob/master/models/export.py#L225

Thanks, Mark

jeanniefinks commented 2 years ago

Hello @IdoZach As it's been sometime with no response, I am going to go ahead and close this comment. Please re-open if you have a follow-up. Also, I invite you to "star" our Sparsify repo if you like! We enjoy seeing the community support. https://github.com/neuralmagic/sparsify/

Thank you! Jeannie / Neural Magic