sony / model_optimization

Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. This project provides researchers, developers, and engineers advanced quantization and compression tools for deploying state-of-the-art neural networks.
https://sony.github.io/model_optimization/
Apache License 2.0
323 stars 51 forks source link

Is GPTQ working for pytorch? #784

Closed mikeseven closed 1 year ago

mikeseven commented 1 year ago

Issue Type

Bug

Source

source

MCT Version

Main

OS Platform and Distribution

Ubuntu 20.04

Python version

3.11

Describe the issue

Using the quick start, GPTQ always leads to much inferior accuracy, from resnet to efficientnet classification torchvision models. I would have expected similar or better accuracy.

Expected behaviour

Similar or better accuracy than PTQ.
Currently always much lower.

Code to reproduce the issue

Quick start tutorial with resnet18/34/50,efficientnet b0 or v2 s.
GPTQ option.

Log output

No response

ofirgo commented 1 year ago

Hi @mikeseven Thank you for bringing this issue to our attention.

Can you please detail the accuracy results that you're getting when running resnet18, 50, and efficientnet_b0 with and without GPTQ. I want to make sure that we are getting similar results before I can identify the source of the issue.

ofirgo commented 1 year ago

@mikeseven It is possible that you got insufficient results with GPTQ because the quick start arguments for GPTQ didn't have compatible default values. We fixed this (see #785), so you might want to try and run GPTQ again.

As a general note - the GPTQ algorithm performs differently for different networks, and its parameters might need different calibration. If the results are still insufficient, try to change the learning rate (--gptq_lr) or increase the number of GPTQ iterations (--gptq_num_calibration_iter). Note that the latter would increase the runtime of the algorithm.

mikeseven commented 1 year ago

Dear @ofirgo,

Indeed, I realized your fix improved previous results on efficientnet_b0 by about 1.1%. However, the main contributor is the batch size in the evaluation of hessian. Using MSE for both activations and weights does provide a slight improvement, which suggests that the quantization configuration for PTQ and GPTQ may need to be different. I'll have to play with the parameters to figure out how to truly leverage it.

Thanks for your help.

ofirgo commented 1 year ago

@mikeseven Thank you for the informative feedback.

I'm closing the issue since it seems that you figured that out. Let us know if you need any more help.