huggingface / optimum-quanto

A pytorch quantization backend for optimum
Apache License 2.0
645 stars 36 forks source link

Unable quantize a single linear layer: throws error: ValueError: Cannot quantize Tensor of shape torch.Size([1, 10]) along axis 0 of size 1 #192

Closed rajat-008 closed 2 weeks ago

rajat-008 commented 2 months ago
class Model(nn.Module):
  def __init__(self):
    super(Model, self).__init__()
    self.layer = nn.Linear(10,1)
  def forward(self, input):
    out = self.layer(input)
    return out

Above is the model I have defined. When I try to quantize the above, and call freeze I get the below error-

ValueError                                Traceback (most recent call last)
[<ipython-input-72-7bc888eff9ed>](https://localhost:8080/#) in <cell line: 1>()
----> 1 freeze(model)

5 frames
[/usr/local/lib/python3.10/dist-packages/quanto/quantize.py](https://localhost:8080/#) in freeze(model)
     38     for name, m in model.named_modules():
     39         if isinstance(m, QModuleMixin):
---> 40             m.freeze()

[/usr/local/lib/python3.10/dist-packages/quanto/nn/qmodule.py](https://localhost:8080/#) in freeze(self)
    249 
    250     def freeze(self):
--> 251         qweight = self.qweight
    252         if qweight is not None:
    253             # Replace float weights by quantized weights

[/usr/local/lib/python3.10/dist-packages/quanto/nn/qmodule.py](https://localhost:8080/#) in qweight(self)
    222         elif isinstance(self.weight_qtype, qtype):
    223             wscale = absmax_scale(self.weight, axis=0)
--> 224             return QTensor.quantize(self.weight, qtype=self.weight_qtype, axis=0, group_size=None, scale=wscale)
    225         raise ValueError(f"Invalid quantized weights type {self.weight_qtype}")
    226 

[/usr/local/lib/python3.10/dist-packages/quanto/tensor/qtensor.py](https://localhost:8080/#) in quantize(cls, base, qtype, axis, group_size, scale)
    119         if scale is None:
    120             scale = absmax_scale(base, qtype, axis, group_size)
--> 121         return Quantizer.apply(base, qtype, axis, group_size, scale)
    122 
    123     def dequantize(self):

[/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py](https://localhost:8080/#) in apply(cls, *args, **kwargs)
    551             # See NOTE: [functorch vjp and autograd interaction]
    552             args = _functorch.utils.unwrap_dead_wrappers(args)
--> 553             return super().apply(*args, **kwargs)  # type: ignore[misc]
    554 
    555         if not is_setup_ctx_defined:

[/usr/local/lib/python3.10/dist-packages/quanto/tensor/qtensor.py](https://localhost:8080/#) in forward(ctx, base, qtype, axis, group_size, scale)
     45                 raise ValueError("QTensor can only be quantized along the first or last axis.")
     46             if base.shape[axis] == 1:
---> 47                 raise ValueError(f"Cannot quantize Tensor of shape {base.shape} along axis {axis} of size 1")
     48             if group_size is not None:
     49                 base = group(base, axis=axis, group_size=group_size)

ValueError: Cannot quantize Tensor of shape torch.Size([1, 10]) along axis 0 of size 1
dacorvo commented 2 months ago

As specified in the error message, you cannot quantize a Linear with a single output feature. The code should detect that and switch to a quantization per-tensor, so it is a bug. Try increasing the number of output features: this should work.

github-actions[bot] commented 1 month ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] commented 4 weeks ago

This issue was closed because it has been stalled for 5 days with no activity.

merveenoyan commented 3 weeks ago

having the same issue with loading OWLv2 model into 8-bit, when is this going to be fixed? @dacorvo this is a zero-shot model so I cannot change much.

dacorvo commented 2 weeks ago

@merveenoyan in addition to fixing this I added an object detection example based on OWLv2. https://github.com/huggingface/optimum-quanto/blob/main/examples/vision/object-detection/quantize_owl_model.py You need to install the package from the main branch to use it.