huggingface / optimum-quanto

A pytorch quantization backend for optimum
Apache License 2.0
753 stars 55 forks source link

Error conv2d() received an invalid combination of arguments after quantize the model #256

Open KhaoKhao opened 1 month ago

KhaoKhao commented 1 month ago

Hi, I’m encountering an issue when trying to quantize the model. The quantization process completes successfully, but when I attempt to calibrate or make a prediction, I receive the following error: :

TypeError: conv2d() received an invalid combination of arguments - got (Tensor, NoneType, Parameter, tuple, tuple, tuple, int), but expected one of:
 * (Tensor input, Tensor weight, Tensor bias, tuple of ints stride, tuple of ints padding, tuple of ints dilation, int groups)
      didn't match because some of the arguments have invalid types: (Tensor, !NoneType!, !Parameter!, !tuple of (int, int)!, !tuple of (int, int)!, !tuple of (int, int)!, int)
 * (Tensor input, Tensor weight, Tensor bias, tuple of ints stride, str padding, tuple of ints dilation, int groups)
      didn't match because some of the arguments have invalid types: (Tensor, !NoneType!, !Parameter!, !tuple of (int, int)!, !tuple of (int, int)!, !tuple of (int, int)!, int) 

I investigated the error and found that it seems to originate from line 10.

TypeError                                 Traceback (most recent call last)
Cell In[30], [line 10]
      [8] # print(moire_model_int8)
      [9] with Calibration():
---> [10] test(moire_model_int8, test_loader, device, criterion)

Cell In[24], [line 10]
      [8] for image, label in test_loader:
      [9] image, label = image.to(device), label.to(device)
---> [10] output = model(image)
     [11] if isinstance(output, QTensor):
     [12] output = output.dequantize()

and here is the model after quantized

(conv1): Sequential(
    (0): QConv2d(1, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (conv2): Sequential(
    (0): QConv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (conv3): Sequential(
    (0): QConv2d(32, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (conv4): Sequential(
    (0): QConv2d(16, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (dropout): Dropout(p=0.25, inplace=False)
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear1): QLinear(in_features=25088, out_features=256, bias=True)
  (linear2): QLinear(in_features=256, out_features=128, bias=True)
  (linear3): QLinear(in_features=128, out_features=64, bias=True)
  (linear4): QLinear(in_features=64, out_features=32, bias=True)
  (linear5): QLinear(in_features=32, out_features=16, bias=True)
  (linear6): QLinear(in_features=16, out_features=2, bias=True)

I would greatly appreciate any assistance you could provide with this problem. I have been trying to find a solution but have been unsuccessful so far. Thank you in advance for your help.

dacorvo commented 1 month ago

@KhaoKhao thank you for your feedback. As a first step, can you verify your script works on the non-quantized model ? If so, can you provide that script so that the issue can be reproduced ?

KhaoKhao commented 1 month ago

@KhaoKhao thank you for your feedback. As a first step, can you verify your script works on the non-quantized model ? If so, can you provide that script so that the issue can be reproduced ?

Sure. I've tested the script with the non-quantized model, and it works perfectly fine. Here is the script I used (I implemented it based on the quantize_mnist_model.py in the repo).

def test(model, test_loader, device, criterion):
    model.to(device)
    model.eval()
    test_loss = 0.0
    correct = 0
    with torch.no_grad():
        start = time.time()
        for image, label in test_loader:
            image, label = image.to(device), label.to(device)
            output = model(image)
            if isinstance(output, QTensor):
                output = output.dequantize()
            loss = criterion(output, label)
            test_loss += loss.item()
            pred = torch.argmax(output, 1)
            correct += torch.sum(pred == label.data)
    avg_loss = test_loss/len(test_loader)
    accuracy = 100 * correct/len(test_loader.dataset)
    end = time.time()
    torch.save(model.state_dict(), 'tmp.pt') # Get model size
    print(output.shape)
    print(f'  -Evaluateed in {end-start} s.\n  -Average loss : {avg_loss} \n  -Accuracy : {accuracy} \n  -Size : {"%.2f MB" %(os.path.getsize("tmp.pt")/1e6)}')
    os.remove('tmp.pt')

This part work perfectly fine

###------- Loading model -------###
model = models.resnet18(pretrained=False)
num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs, 2)
model = model.to(device)
# Load weight
model.load_state_dict(torch.load('PATH_TO_MODEL_WEIGHT'))

###------- Float model -------###
print('Float model :')
test(model, test_loader, device, criterion)

But this part gave the result mentioned above.

###------- Quantization -------###

###------- int8 -------###
print('Quantize int8 model :')
quantize(model, weight=qint8, activations = None)
print(' *Calibrating...')
print(model)
with Calibration():
    test(model, test_loader,device,criterion)

I've tried other models as well, and it seems the problem originates from the conv2d layer. May be the weight argument is missing after quantized? PS: I've tried using your model (dacorvo/mnist-mlp), and it works perfectly fine.

Thanks for your help!

github-actions[bot] commented 2 weeks ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

dacorvo commented 1 week ago

@KhaoKhao I don't know which model you used for your tests, but this works on my setup, either with optimum-quanto 0.2.3, 0.2.4 or on the main branch:

from transformers import AutoImageProcessor, AutoModelForImageClassification
from datasets import load_dataset
from optimum.quanto import Calibration, quantize, qint8

dataset = load_dataset("huggingface/cats-image")
image = dataset["test"]["image"][0]

image_processor = AutoImageProcessor.from_pretrained("microsoft/resnet-18")
model = AutoModelForImageClassification.from_pretrained("microsoft/resnet-18")

quantize(model, weights=qint8, activations=qint8)

inputs = image_processor(image, return_tensors="pt")
with Calibration():
    logits = model(**inputs).logits

print(logits)