Closed porkfatmystery closed 2 days ago
A little update -- for now I have just added the following method in QuantizationSimModel (aimet_torch/quantsim.py):
def deactivate_layer(self, layername: str):
found = False
for name, module in self.model.named_modules():
if not isinstance(module, QcQuantizeWrapper):
continue
if name == layername:
module.enable_param_quantizers(False, None)
module.enable_activation_quantizers(False)
print("deactivated QcQuantizeWrapper for '%s'"%(layername))
found = True
if not found:
print("DID NOT FIND QcQuantizeWrapper for '%s'"%(layername))
I call it with all the encoder-side layers that I do not wish to have quantized.
For the moment, seems to do what I want. Can you see any issues with this, or if there was a different way of doing the same thing?
@pzfm3022 I'm also another user of AIMET, I'm wondering if you have resolved this issue? If not, perhaps I can offer another solution if needed
@LLNLanLeN thanks for your comment -- I discovered an 'exclude_layers_from_quantization' field in QuantizationSimModel that generally did what I wanted. I had a make a few fixes to make autoquant work with that when I played with autoquant.
Is that what you were trying? Or are you doing something different?
@pzfm3022 glad to hear that it's working for you. I've been using an older version of AIMET, 1.17.0 iirc, hence this feature didn't existed ( I think, I might be wrong though), so I have a different approach, one that might not be as elegant as you have here. Basically instead of creating 1 network contain both encoder and decoder, I created 2 network, one network containing encoder only, and the other containing only the decoder. This way, you can apply Quant config on the selective part only.
@LLNLanLeN -- very nice! And, true, I used a later version of AIMET to get the exclude_layers* function. I then threw my solution (above) away. ;)
Still, if you use it, you can disable some decoder side layers if you have quant problems, to see what is sensitive to quantization.
Just a little context -- we are using a pytorch auto-encoder based video codec, so it has 'encoder' and 'decoder' halves that work together. We would like to only activate quantization on the decoder side (that will eventually be on the client), while the encoder side remains full floating point. Then we can see the effects of quantization, and how quantization aware training can reduce any error.
Looking at the control available in the config.json, I cannot find a way to activate/deactivate quantization on a per-layer basis in our model. Is it at all possible?
For example, at the moment both encoder and decoder are quantized, eg, a "g_a" analysis (encoder) layer reports:
Thanks in advance for any info!