huggingface / pytorch-image-models

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more
https://huggingface.co/docs/timm
Apache License 2.0
32.45k stars 4.77k forks source link

[BUG]Resent50 model with wrong precision after quantization with tensorrt int8PTQ #1412

Closed lixiaolx closed 1 year ago

lixiaolx commented 2 years ago

Describe the bug timm0.6.7 version, using timm's resnet50, first convert the model to onnx, and then use tenosrrt's model PTQ quantization function, the quantized model is verified on the val (50000) test set of imageNet1k, compared to the original FP32 accuracy loss larger. imageNet1k, valadation 5w FP32result: batch_size:64 Acc@1 71.341 (28.659) Acc@5 88.570 (11.430) imageNet1k, valadation 5w PTQint8result: batch_size:1 Acc@1 4.534 (95.466) Acc@5 8.576 (91.424)

The same environment uses torchvision's resnet50, and the accuracy before and after quantization is basically lossless.

To Reproduce Steps to reproduce the behavior: Referenced usage script:https://github.com/rmccorm4/tensorrt-utils/tree/master/int8/calibration

rwightman commented 2 years ago

@lixiaolx the accuracy is wrong before quantization as well so not sure what your setup is, it should be between 80.3 and 80.4 for default image pipeline.

lixiaolx commented 2 years ago

the accuracy is wrong before quantization as well so not sure what your setup is, it should be between 80.3 and 80.4 for default image pipeline.

First of all, sorry that the value of FP32 should be wrong, I re-measured the accuracy under fp32: imageNet1k, valadation 5w result:batch_size:64 Acc@1 80.078 (19.922) Acc@5 94.568 (5.432) But using trt's int8 quantized value is still very poor: imageNet1k, valadation 5w result:batch_size:64 Acc@1 4.027 (95.973) Acc@5 7.830 (92.169) In the same environment and script, there is basically no loss of accuracy before and after using torchVision quantization. Finally, here is my test script setup: fp32_test.py

import os
import torch
import tensorrt as trt
import torch.nn as nn
from torch.nn import functional as F
import torchvision
import torchvision.transforms as transforms
import timm
from torchvision.datasets import ImageFolder
import time
from timm.utils import accuracy, AverageMeter
from torch.utils.data import DataLoader
from tqdm import tqdm
from collections import OrderedDict

net = timm.create_model('resnet50', pretrained=True)
model = torch.jit.script(net).eval().cuda()

# import torchvision.models as models
# mod = models.resnet50(pretrained=True).eval()
# mod_jit = torch.jit.script(mod)
# model = mod_jit.cuda()

dummy_min = torch.rand(64, 3, 224, 224)
with torch.no_grad():
    fp32_out = model(dummy_min.cuda())

val_transforms = torchvision.transforms.Compose([
                                            torchvision.transforms.Resize(224),
                                            torchvision.transforms.CenterCrop(224),
                                            torchvision.transforms.ToTensor(),
                                            torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])

root = "./imagenet1k_pytorch/val"
drop_last=True
num_workers=8
batch_size=64

batch_time = AverageMeter()
losses = AverageMeter()
top1 = AverageMeter()
top5 = AverageMeter()
dataset = ImageFolder(root, transform=val_transforms)
loader = DataLoader(dataset, batch_size, drop_last=drop_last, num_workers=num_workers)

for batch_idx, batch in enumerate(tqdm(loader)):
    img, label = batch
    end = time.time()
    img, label = img.cuda(), label.cuda()
    with torch.no_grad():
        outputs = model(img)
    acc1, acc5 = accuracy(outputs.detach(), label, topk=(1, 5))
    losses.update(0., outputs.size(0))
    top1.update(acc1.item(), outputs.size(0))
    top5.update(acc5.item(), outputs.size(0))
    batch_time.update(time.time() - end)
    top1a, top5a = top1.avg, top5.avg

results = OrderedDict(
    model="UTF-8",
    top1=round(top1a, 4), top1_err=round(100 - top1a, 4),
    top5=round(top5a, 4), top5_err=round(100 - top5a, 4),
    img_size=batch_size)
print("fp32-benckmark result:")
line = 'batch_size:{} * Acc@1 {:.3f} ({:.3f})\t Acc@5 {:.3f} ({:.3f})\t time@avg{:.4f}'.format( batch_size, results['top1'], results['top1_err'], results['top5'], results['top5_err'], batch_time.avg)
print("imageNet1k, valadation 5w result:")
print(line)

trans_onnx.py

import timm
import torch
import time
import onnx
import os
import onnxruntime
import numpy as np

# onnx_file_path = "./resnet50_bs1.onnx"
onnx_file_path = "./resnet50_bs64.onnx"

dummy_min = torch.rand(64, 3, 224, 224)
print('input_bach: ', dummy_min.shape)

model = timm.create_model('resnet50', pretrained=True)
net = torch.jit.script(model).eval().cuda()

# import torchvision.models as models
# mod = models.resnet50(pretrained=True).eval()
# mod_jit = torch.jit.script(mod)
# net = mod_jit.cuda()

native_net = net
t3 = time.time()
with torch.no_grad():
    native_out = native_net(dummy_min.cuda())
t4 = time.time()

torch.onnx.export(net, dummy_min.cuda(), onnx_file_path, verbose=True, example_outputs=native_out)
session = onnxruntime.InferenceSession(onnx_file_path)
out_r = session.run(None, {"x.1": np.ascontiguousarray(dummy_min)})
print(len(out_r))
print(out_r)
rwightman commented 2 years ago

@lixiaolx have you tried V2 torchvision weights? they have a training recipe that's closer to the current ones in timm https://pytorch.org/vision/stable/models.html#initializing-pre-trained-models ... I feel there is a chance that training recipe could result in weights that are less 'quantizable' without further fine-tune

you could also try in timm tv_resnet50 .... it's the v1 torchvision weights with the timm ResNet code, it would verify if there are any modelling changes I've made (ie control flow or other things) that may have hurt quantization workflow

lixiaolx commented 2 years ago

have you tried V2 torchvision weights?

Haven't tried it yet, my torchvision is version limited, no interface for V2

you could also try in timm tv_resnet50 .... it's the v1 torchvision weights with the timm ResNet code, it would verify if there are any modelling changes I've made (ie control flow or other things) that may have hurt quantization workflow

I tested tv_resnet50 and found that the accuracy of the model before and after quantization can be basically aligned with the accuracy of torchvision Before quantization: batch_size:64 Acc@1 75.692 (24.308) Acc@5 92.770 After quantization: batch_size:64 Acc@1 75.786 (24.214) Acc@5 92.732 (7.268)

At present, based on your feedback and my test, timme's renet50 should be different from torchvision. Can you help me solve it?

lixiaolx commented 2 years ago

@rwightman Hello, I use torch-Tensorrt to get the lowered pass of each model to get the script pass code. I compared the difference between torchvision and timm, and found that 4 additional biases were introduced in timm: 1.self.layer1.0.downsample.1. bias 2.%self.layer2.0.downsample.1.bias 3.%self.layer3.0.downsample.1.bias 4.%self.layer4.0.downsample.1.bias, There are differences in the subsequent construction of conv and bn op. I am not sure whether these differences cause the accuracy of quantization to deteriorate. Can you help confirm and check it? Can you help resolve these differences? image

rwightman commented 1 year ago

not much I can do here, it's a difference in weights only so it's an issue with the quantization - weight combo, not the modelling code