quic / aimet

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
https://quic.github.io/aimet-pages/index.html
Other
2.09k stars 374 forks source link

Quantization simulation configuration question #1424

Closed porkfatmystery closed 2 days ago

porkfatmystery commented 2 years ago

Just a little context -- we are using a pytorch auto-encoder based video codec, so it has 'encoder' and 'decoder' halves that work together. We would like to only activate quantization on the decoder side (that will eventually be on the client), while the encoder side remains full floating point. Then we can see the effects of quantization, and how quantization aware training can reduce any error.

Looking at the control available in the config.json, I cannot find a way to activate/deactivate quantization on a per-layer basis in our model. Is it at all possible?

For example, at the moment both encoder and decoder are quantized, eg, a "g_a" analysis (encoder) layer reports:

Layer: codec_net.codec_net.g_a.layers.0.layers.1
  Input[0]: Not quantized
  -------
  Param[weight]: bw=8, encoding-present=True
    StaticGrid TensorQuantizer:
    quant-scheme:QuantScheme.post_training_tf_enhanced, round_mode=RoundingMode.ROUND_NEAREST, bitwidth=8, enabled=True
    min:-0.5131248235702515, max=0.4857059717178345, delta=0.003916983492672443, offset=-131.0
  -------
  Param[bias]: bw=8, encoding-present=True
    StaticGrid TensorQuantizer:
    quant-scheme:QuantScheme.post_training_tf_enhanced, round_mode=RoundingMode.ROUND_NEAREST, bitwidth=8, enabled=True
    min:-0.22618022561073303, max=0.20747357606887817, delta=0.0017006031703203917, offset=-133.0
  -------
  Output[0]: bw=8, encoding-present=False
  -------

Thanks in advance for any info!

porkfatmystery commented 2 years ago

A little update -- for now I have just added the following method in QuantizationSimModel (aimet_torch/quantsim.py):

    def deactivate_layer(self, layername: str):
        found = False
        for name, module in self.model.named_modules():
            if not isinstance(module, QcQuantizeWrapper):
                continue
            if name == layername:
                module.enable_param_quantizers(False, None)
                module.enable_activation_quantizers(False)
                print("deactivated QcQuantizeWrapper for '%s'"%(layername))
                found = True
        if not found:
            print("DID NOT FIND QcQuantizeWrapper for '%s'"%(layername))

I call it with all the encoder-side layers that I do not wish to have quantized.

For the moment, seems to do what I want. Can you see any issues with this, or if there was a different way of doing the same thing?

LLNLanLeN commented 1 year ago

@pzfm3022 I'm also another user of AIMET, I'm wondering if you have resolved this issue? If not, perhaps I can offer another solution if needed

porkfatmystery commented 1 year ago

@LLNLanLeN thanks for your comment -- I discovered an 'exclude_layers_from_quantization' field in QuantizationSimModel that generally did what I wanted. I had a make a few fixes to make autoquant work with that when I played with autoquant.

Is that what you were trying? Or are you doing something different?

LLNLanLeN commented 1 year ago

@pzfm3022 glad to hear that it's working for you. I've been using an older version of AIMET, 1.17.0 iirc, hence this feature didn't existed ( I think, I might be wrong though), so I have a different approach, one that might not be as elegant as you have here. Basically instead of creating 1 network contain both encoder and decoder, I created 2 network, one network containing encoder only, and the other containing only the decoder. This way, you can apply Quant config on the selective part only.

porkfatmystery commented 1 year ago

@LLNLanLeN -- very nice! And, true, I used a later version of AIMET to get the exclude_layers* function. I then threw my solution (above) away. ;)

Still, if you use it, you can disable some decoder side layers if you have quant problems, to see what is sensitive to quantization.