openvinotoolkit / nncf

Neural Network Compression Framework for enhanced OpenVINO™ inference
Apache License 2.0
921 stars 229 forks source link

change bitwidth for a spesicif layer after quantization #1135

Closed mkimhi closed 2 years ago

mkimhi commented 2 years ago

I would like to set a specific bitwidth for a specific layer- post quantization. Is there a way to set the bitwidth directly? I don't want to copy the weight dict and set a new quantization with config

Thanks

ljaljushkin commented 2 years ago

Hello @mkimhi!

There's a manual initialization of quantization for this.

Please refer to the mobilenet_v2_imagenet_mixed_int_manual.json as a reference example.

There are 3 key things: 1) list of bitwidth per operation scopes:

"compression": {
  "algorithm": "quantization",
  "initializer": {
    "precision": {
      "type": "manual",
        "bitwidth_per_scope": [
          [4, "SqueezeNet/Sequential[classifier]/NNCFConv2d[1]/conv2d_0|WEIGHT"],
          ...

You can set an arbitrary bitwidth instead of 4 here. This list is dumped to bitwidth_per_scope.json to the log directory for mixed precision (HAWQ) case only: mobilenet_v2_imagenet_mixed_int_hawq.json If this list is needed even for fully INT8 quantization scenario, we could add an option to dump it for this case as well. Please let us know. 2) target device should be TRIAL if you want to set an arbitrary bitwidth. CPU device doesn't support INT4 at all, VPU one does support INT4, but with some hardware specific constraints

"target_device": "TRIAL",

3) load pre-trained model and checkpoint via --weights option, not a --resume one. The first option assumes initialization of quantization (bitwidth and quantization ranges), the second - restoring training without any initialization.

It's optional to change other quantization parameters for TRIAL device:

      "algorithm": "quantization",
      "weights": {
            "mode": "asymmetric",
            "per_channel": true,
            "bits": 4
        },
        "activations": {
            "mode": "asymmetric"
        },
mkimhi commented 2 years ago

Thank you Nikolay,

my desire is to train a network with QAT scheme, and them change a specific layer bit width and fine-tune the model for a little bit more.

thank you

ljaljushkin commented 2 years ago

In that case, #1136 can be helpful. bitwidth_per_scope is printed to console in debug mode starting from it:

from nncf.common.utils.logger import set_log_level
set_log_level(logging.DEBUG)
ljaljushkin commented 2 years ago

@mkimhi please let us know whether it helps to achieve your goals and whether we can close the issue

mkimhi commented 2 years ago

Thanks again, i figured a solution for my usage

for name, m in model.named_modules():
     if name== desired_layer:
        m.num_bits = desired_bits

am i missing recalculating the scaling when i do that?

ljaljushkin commented 2 years ago

@mkimhi, no. quantization scale is staying the same for all bits. No need to recalculate it. The actual quantization of values takes into account the num_bits and changes the total number of quantized values (e.g. 256 for int8, 16 for int4) and interval between them.

mkimhi commented 2 years ago

then for some reason, high_level get the value of 0 when i'm trying to go to binarization (desired_bits=1 in the code snip above): image

i would be very gratful to solve this issue

ljaljushkin commented 2 years ago

@mkimhi sorry for delayed answer For binarization we use different formula and kernels (DOREFA or XNOR). The mixed precision quantization algorithm doesn't support 1 bit. please refer to docs, code and configs to learn more about binarization.

mkimhi commented 2 years ago

thank you!