Open clementpoiret opened 10 months ago
Some more details:
If I wrap x around QuantStub
and DeQuantStub
as in PyTorch's doc, such as:
# Define your main model here
class VeryComplexModel(nn.Module):
def __init__(self):
super().__init__()
self.quant = QuantStub()
self.dequant = DeQuantStub()
self.backbone = timm.create_model("vit_small_patch14_dinov2.lvd142m",
pretrained=True)
self.mlp = nn.Sequential(nn.Linear(self.backbone.num_features, 128),
nn.ReLU(), nn.Linear(128, 10))
def forward(self, x):
x = self.quant(x)
x = self.mlp(self.backbone(x))
x = self.dequant(x)
return x
I still have an error, not for 'quantized::conv2d.new'
anymore, but:
NotImplementedError: Could not run 'aten::qscheme' with arguments from the 'CPU' backend.
I made some progresses, using a very simple model, not involving this aten:qscheme
works, such as:
# Define your main model here
class VeryComplexModel(nn.Module):
def __init__(self):
super().__init__()
self.quant = torch.quantization.QuantStub()
self.dequant = torch.quantization.DeQuantStub()
self.backbone = nn.Sequential(
nn.Conv2d(1, 2, 3),
nn.ReLU(),
)
self.mlp = nn.Linear(1352, 10)
def forward(self, x):
x = self.quant(x)
x = self.backbone(x)
x = x.flatten(1)
x = self.mlp(x)
x = self.dequant(x)
return x
# Then, define your LightningModule as usual
class Classifier(L.LightningModule):
def __init__(self):
super().__init__()
# This is mandatory for the callbacks
self.model = VeryComplexModel()
def forward(self, x):
return self.model(x)
def training_step(self, batch, batch_idx):
x, y = batch
y_hat = self.forward(x)
loss = F.cross_entropy(y_hat, y)
return loss
def configure_optimizers(self):
optimizer = optim.Adam(self.parameters(), lr=1e-3)
return [optimizer]
The issue seems to occur in the forward method of the visiontranformer class, do you have any clue? maybe a config to exclude the _pos_embed
fn from quantization?
Hi, any news on the issue? The only workaround I found has been to train normally first, compile to ONNX, then use a PTQ directly on the ONNX model to avoid using QAT, but that's not a real fix :/
Hi @clementpoiret , sorry for the late response. This is not an issue of export. The QAT PyTorch model you generated is invalid. Please refer to this document for the usage of QAT.
Dear @yuwenzho thanks for your answer. You're right, I certainly have a bug in my callbacks. But even following the doc, I can't export dinov2 as ONNX:
torch.onnx.errors.SymbolicValueError: ONNX symbolic expected the output of `%permute : Tensor(*, *, *, *, *) = onnx::Transpose[perm=[2, 0, 3, 1, 4]](%reshape), scope: torch.fx.graph_module.GraphModule:: # <eval_with_key>.31:48:0
` to be a quantized tensor. Is this likely due to missing support for quantized `onnx::Transpose`.
Here is a complete code snippet:
import os
import timm
import torch
import torch.nn as nn
import torch.nn.functional as F
from neural_compressor import QuantizationAwareTrainingConfig
from neural_compressor.config import Torch2ONNXConfig
from neural_compressor.training import prepare_compression
from torch import optim, utils
from torch.utils.data import DataLoader
from torchvision import transforms
from torchvision.datasets import MNIST
from torchvision.transforms import ToTensor
from tqdm import tqdm
# Define your main model here
class VeryComplexModel(nn.Module):
def __init__(self):
super().__init__()
self.backbone = timm.create_model("vit_small_patch14_dinov2.lvd142m",
pretrained=True)
self.clf_layers = nn.Sequential(
nn.Linear(self.backbone.num_features, 128), nn.ReLU(),
nn.Linear(128, 10))
def forward(self, x):
# x = x.repeat(1, 3, 1, 1)
# x = F.interpolate(x, size=(518, 518))
x = self.backbone(x)
x = self.clf_layers(x)
return x
criterion = nn.CrossEntropyLoss()
model = VeryComplexModel()
dataset = MNIST(os.getcwd(), download=True, transform=ToTensor())
train_loader = utils.data.DataLoader(dataset)
def train(model, steps=10):
optimizer = optim.Adam(model.parameters(), lr=0.001)
model.train()
for epoch in range(2):
for i, (data, target) in enumerate(tqdm(train_loader)):
if i > steps:
break
# repeat and interpolate to match the input shape
data = data.repeat(1, 3, 1, 1)
data = F.interpolate(data, size=(518, 518))
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
conf = QuantizationAwareTrainingConfig()
compression_manager = prepare_compression(model, conf)
compression_manager.callbacks.on_train_begin()
model = compression_manager.model
train(model)
compression_manager.callbacks.on_train_end()
compression_manager.save("./output")
# Export as ONNX
model.export(
"int8_model.onnx",
Torch2ONNXConfig(
dtype="int8",
opset_version=17,
quant_format="QDQ",
example_inputs=torch.randn(1, 3, 518, 518),
input_names=["input"],
output_names=["output"],
dynamic_axes={
"input": {
0: "batch_size"
},
"output": {
0: "batch_size"
},
},
))
This error means that quantized transpose is not supported.
@PenghuiCheng I tried setting the transpose to FP32 but it doesn't work, could you please help to check?
from neural_compressor.utils.constant import FP32
conf = QuantizationAwareTrainingConfig(
op_type_dict={"transpose":FP32,}
)
Hi @clementpoiret , as @yuwenzho mentioned above, quantized transpose
is not supported by ONNX exporter. For the full supported/unsupported TorchScript operators by ONNX export, please refer to ONNX SUPPORTED TORCHSCRIPT OPERATORS. While INC can fallback PyTorch modules that perform quantized operations to fp32 (typically defined for weighted operations like Linear
and Conv
), operators like transpose
are traced and quantized by PyTorch thus cannot be set to fp32 during quantization in INC.
For your circumstance, you can try either creating a symbolic function to convert the operator and register it as a custom symbolic function, or contribute to PyTorch to add the same symbolic function to torch.onnx
. For more details, you can refer to adding quantized ops.
Dear all,
In order to easily use Intel Neural Compressor in our team, and because we use PyTorch Lightning, I am building Lightning Callbacks in order to call your hooks when needed in lightning's training loop. Here is the current state of the project: https://github.com/clementpoiret/lightning-nc
Unfortunately, I face issues when trying to export the quantized models in ONNX. Exporting a fp32 model does not cause issues.
Here is a toy example you can play with (requires torch 2.1 and lightning 2.1):
when calling the
export(...)
fn, I end up with the following error:Do you have any clue?
I tried training completely on CPU by setting
accelerator="cpu"
on theTrainer
, same issue.Thanks a lot, Clément.