🐛 [Bug] Segmentation Fault When Trying to Quantize ResNet50 model

henrycharlesworth commented 2 years ago

Bug Description

I'm using torch_tensorrt to try and quantize a pretrained ResNet50 model (roughly following the steps here), but I am getting a segmentation fault. I've tried running the code on two different machines using the latest docker image here but get the same segmentation fault on both. Also, when I try to compile the model to TensorRT with fp16 instead of quantizing it works fine.

To Reproduce

Reduced example code:

main.py:

import torch
import torch.utils.data as data
import torchvision.transforms as transforms
import torchvision.datasets as datasets
import torch_tensorrt

from pytorch_quantization import nn as quant_nn
from pytorch_quantization import quant_modules

from cifar10_models.resnet import resnet50
from utils import calibrate_model

testing_dataset = datasets.CIFAR10(root='./data',
                                   train=False,
                                   download=True,
                                   transform=transforms.Compose([
                                       transforms.ToTensor(),
                                       transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
                                   ]))

testing_dataloader = torch.utils.data.DataLoader(testing_dataset,
                                                 batch_size=16,
                                                 shuffle=False,
                                                 num_workers=2)

# INITIALIZE QUANTIZATION
quant_modules.initialize()
model_quantized = resnet50(pretrained=True).cuda().eval()

calibrate_model(model_quantized, testing_dataloader, num_calib_batch=32, calibrator="max")

#export model to torchscript
quant_nn.TensorQuantizer.use_fb_fake_quant = True
with torch.no_grad():
    data = iter(testing_dataloader)
    images, _ = data.next()
    jit_model = torch.jit.trace(model_quantized, images.to("cuda"))

#build tensorRT model
compile_spec = {"inputs": [torch_tensorrt.Input([16, 3, 32, 32])],
                "enabled_precisions": torch.int8}
model_tensorrt = torch_tensorrt.compile(jit_model, **compile_spec)

with utils.py having:

def collect_stats(model, data_loader, num_batches):
    """Feed data to the network and collect statistic"""

    # Enable calibrators
    for name, module in model.named_modules():
        if isinstance(module, quant_nn.TensorQuantizer):
            if module._calibrator is not None:
                module.disable_quant()
                module.enable_calib()
            else:
                module.disable()

    for i, (image, _) in tqdm(enumerate(data_loader), total=num_batches):
        model(image.cuda())
        if i >= num_batches:
            break

    # Disable calibrators
    for name, module in model.named_modules():
        if isinstance(module, quant_nn.TensorQuantizer):
            if module._calibrator is not None:
                module.enable_quant()
                module.disable_calib()
            else:
                module.enable()

def compute_amax(model, **kwargs):
    # Load calib result
    for name, module in model.named_modules():
        if isinstance(module, quant_nn.TensorQuantizer):
            if module._calibrator is not None:
                if isinstance(module._calibrator, calib.MaxCalibrator):
                    module.load_calib_amax()
                else:
                    module.load_calib_amax(**kwargs)
            print(F"{name:40}: {module}")
    model.cuda()

def calibrate_model(model, data_loader, num_calib_batch, calibrator, hist_percentile=None):
    if num_calib_batch > 0:
        print("Calibrating model")
        with torch.no_grad():
            collect_stats(model, data_loader, num_calib_batch)

        if calibrator == "percentile":
            compute_amax(model, method=calibrator, percentile=hist_percentile)
        else:
            compute_amax(model, method=calibrator)

When I run main.py when it gets to the point of compiling the model I get: Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

If I remove quant_modules.initialize() and calibrate_model(...) and change the enabled precision instead to torch.float16 the model compiles without any error.

Expected behavior

Would expect the int8 quantized model to compile without issue.

Environment

Using the nvidia docker image (22.02-py3). Tested on a GTX-3090 GPU and GTX 1650.

Additional context

narendasan commented 2 years ago

@peri044 Can you take a look at this?

chaoz-dev commented 2 years ago

Might be related to #898? @henrycharlesworth Can you try building and running Torch-TensorRT with the TensorRT NGC containers?

I was able to bypass my issue using nvcr.io/nvidia/tensorrt:22.02-py3, PyTorch 1.10, and Torch-TensorRT commit 11bcb98d on master.

Hodapp87 commented 2 years ago

I'm likewise getting a segmentation fault in torch_tensorrt.compile when trying to convert a model to int8. The issue does not occur with float16 or float32. I haven't tried building from source yet with debugging symbols, but gdb tracked it to libtorchtrt.so and torch_tensorrt::core::MapInputsAndDetermineDTypes().

I'm on Torch 1.11.0+cu115, Torch-TensorRT 1.1.0.

Hodapp87 commented 2 years ago

I traced the segfault in my case to line 314 here: https://github.com/pytorch/TensorRT/blob/40f8b44d95e1bf0912757377eb6acba666963e9d/core/compiler.cpp#L311-L316

As far as I can tell, first_use_type_map lacks the key in, and as a result doing ->second on the result of .find(in) is invalid. The code appears to be trying to check for this case, but that point it's too late.

I have a very hasty patch that gets me past this point (I'll do a PR if anyone wants, but I don't know if I'm actually solving much), but it then just leads me to https://github.com/pytorch/TensorRT/issues/922.

diff --git a/core/compiler.cpp b/core/compiler.cpp
index b684b808..0d82bf11 100644
--- a/core/compiler.cpp
+++ b/core/compiler.cpp
@@ -311,8 +311,9 @@ void MapInputsAndDetermineDTypes(
   for (auto& in : g->inputs()) {
     if (static_params.find(in) == static_params.end()) {
       ir::Input& spec = cfg.convert_info.inputs.find(in)->second;
-      auto est_type_opt = first_use_type_map.find(in)->second;
-      if (est_type_opt && !spec.dtype_is_user_defined) {
+      auto count = first_use_type_map.count(in);
+      if (count && !spec.dtype_is_user_defined) {
+        auto est_type_opt = first_use_type_map.find(in)->second;
         // If we can calculate the type from the graph and the type was not defined by the user then use the calculated
         // type
         LOG_INFO(
@@ -320,17 +321,18 @@ void MapInputsAndDetermineDTypes(
             << in->debugName() << " has type " << est_type_opt.value()
             << ". If this is incorrect explicitly set dtype for input and file a bug");
         spec.dtype = util::ScalarTypeToTRTDataType(est_type_opt.value());
-      } else if (!est_type_opt && !spec.dtype_is_user_defined) {
+      } else if (!count && !spec.dtype_is_user_defined) {
         // If we cannot calculate the type and the user did not define the type, then default to FP32
         LOG_WARNING(
             "Cannot infer input type from calcuations in graph for input "
             << in->debugName() << ". Assuming it is Float32. If not, specify input type explicity");
         spec.dtype = nvinfer1::DataType::kFLOAT;
       } else if (spec.dtype_is_user_defined && cfg.partition_info.enabled) {
-        if (!est_type_opt) {
+        if (!count) {
           LOG_INFO("Cannot infer input tensor dtype in graph. Using user provided input dtype settings");
           first_use_type_map[in] = {util::TRTDataTypeToScalarType(cfg.convert_info.inputs.find(in)->second.dtype)};
         } else {
+          auto est_type_opt = first_use_type_map.find(in)->second;
           if (util::TRTDataTypeToScalarType(cfg.convert_info.inputs.find(in)->second.dtype) != est_type_opt.value()) {
             std::stringstream ss;
             ss << "For input " << in->debugName() << ", found user specified input dtype as ";

ivan94fi commented 2 years ago

Hi,

We faced the int8 bug too, in the official docker image, version 22.05. The initial issue is solved with @Hodapp87's patch, but in our case it leads to another issue, not #922 as reported by @Hodapp87.

This is the exception traceback:

Traceback (most recent call last):
  File "./main.py", line 19, in <module>
    trt_ts_module = torch_tensorrt.compile(
  File "/usr/local/lib/python3.8/dist-packages/torch_tensorrt/_compile.py", line 109, in compile
    return torch_tensorrt.ts.compile(ts_mod, inputs=inputs, enabled_precisions=enabled_precisions, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch_tensorrt/ts/_compiler.py", line 113, in compile
    compiled_cpp_mod = _C.compile_graph(module._c, _parse_compile_spec(spec))
RuntimeError: [Error thrown at core/conversion/var/Var.cpp:132] Expected isITensor() to be true but got false
Requested ITensor from Var, however Var type is c10::IValue

Anyone knows how to solve this?

github-actions[bot] commented 2 years ago

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

ivan94fi commented 2 years ago

@peri044 Any news on this issue? We still cannot use our models with int8 precision because of this bug.

ncomly-nvidia commented 2 years ago

@peri044 can we please confirm the PTQ notebook is working properly, then go after this bug? P1

github-actions[bot] commented 1 year ago

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

Christina-Young-NVIDIA commented 1 year ago

We think this is fixed. Dheeraj to check.

ivan94fi commented 1 year ago

Thanks, I will check as soon as possible.

Christina-Young-NVIDIA commented 1 year ago

@ivan94fi have you been able to check? We would like to close this out.

ivan94fi commented 1 year ago

Hi, I can confirm that our model is now correctly converted when using int8 precision with version 1.3.0 of Torch-TensorRT. Thank you!

pytorch / TensorRT