huggingface / optimum-quanto

A pytorch quantization backend for optimum
Apache License 2.0
774 stars 55 forks source link

Accuracy took a big hit with activation=qint8 for an open clip model #216

Closed kechan closed 1 month ago

kechan commented 3 months ago

I am testing out using optimum.quanto on this open clip model:

model = CLIPModel.from_pretrained("laion/CLIP-ViT-L-14-laion2B-s32B-b82K")
processor = CLIPProcessor.from_pretrained("laion/CLIP-ViT-L-14-laion2B-s32B-b82K")

I then quantized it like this:

from optimum.quanto import Calibration, freeze, qfloat8, qint4, qint8, quantize
quantized_model = copy.deepcopy(model)
quantize(quantized_model, weights=qint8, activations=qint8)  
freeze(quantized_model)
quantized_model = quantized_model.to(device)
quantized_model.eval();

I proceeded to test inference on an image (to obtain its vector representation):

image_url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
response = requests.get(image_url)
image = Image.open(BytesIO(response.content))

image_inputs = processor(images=[image], return_tensors="pt")
image_inputs = {name: tensor.to(device) for name, tensor in image_inputs.items()}

with torch.no_grad():
  image_features = model.get_image_features(**image_inputs).  # or quantized_model.get 
  image_features = F.normalize(image_features, p=2, dim=1)

The image_features look wildly diff. In particular, the one from the quantized model has lot of zeros in it:

0.0000, -0.0626, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, -0.0626, 0.0000, 0.0000, 0.0000, 0.0000, 0.0626, 0.0000, 0.0000, 0.0000, 0.1252, -0.0626, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, -0.0626, 0.0000, -0.0626, 0.0000, 0.0000, 0.0626, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0626, 0.0000, 0.0000, 0.0000

I proceeded to test out my downstream task accuracy and it is totally destroyed. I think I may be missing something here and I am not doing something right. If you quantize the activations (which pytorch I think refer to as static quantization??), is calibration mandatory? if you have experience, please let me know. I will try it anyway when I get around.

dacorvo commented 3 months ago

@kechan you're right: if you quantize the activations, a calibration is required. You can refer to the MNIST classification example to see how it works: https://github.com/huggingface/optimum-quanto/blob/main/examples/vision/image-classification/mnist/quantize_mnist_model.py

github-actions[bot] commented 2 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] commented 1 month ago

This issue was closed because it has been stalled for 5 days with no activity.