pytorch / TensorRT

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
https://pytorch.org/TensorRT
BSD 3-Clause "New" or "Revised" License
2.5k stars 344 forks source link

🐛 [Bug] Example notebook qat-ptq-workflow yields a calibration error failure #2777

Closed choosehappy closed 3 months ago

choosehappy commented 4 months ago

Bug Description

All previous lines work as expected, but cell number 28 in this example notebook:

https://github.com/pytorch/TensorRT/blob/main/notebooks/qat-ptq-workflow.ipynb

yields this error:


W0425 19:29:52.724189 140194752341824 _compile.py:108] Input graph is a Torchscript module but the ir provided is default (dynamo). Please set ir=torchscript to suppress the warning. Compiling the module with ir=torchscript
WARNING: [Torch-TensorRT TorchScript Conversion Context] - Calibrator is not being used. Users must provide dynamic range for all tensors that are not Int32 or Bool.
ERROR: [Torch-TensorRT TorchScript Conversion Context] - 4: [standardEngineBuilder.cpp::initCalibrationParams::1718] Error Code 4: Internal Error (Calibration failure occurred with no scaling factors detected. This could be due to no int8 calibrator or insufficient custom scales for network layers. Please see int8 sample to setup calibration correctly.)
----------
RuntimeError                              Traceback (most recent call last)
Cell In[28], line 6
      2 qat_model = torch.jit.load("mobilenetv2_qat.jit.pt").eval()
      3 compile_spec = {"inputs": [torch_tensorrt.Input([64, 3, 224, 224])],
      4                 "enabled_precisions": torch.int8
      5                }
----> 6 trt_mod = torch_tensorrt.compile(qat_model, **compile_spec)

File /usr/local/lib/python3.10/dist-packages/torch_tensorrt/_compile.py:185, in compile(module, ir, inputs, enabled_precisions, **kwargs)
    183         ts_mod = torch.jit.script(module)
    184     assert _non_fx_input_interface(input_list)
--> 185     compiled_ts_module: torch.jit.ScriptModule = torchscript_compile(
    186         ts_mod,
    187         inputs=input_list,
    188         enabled_precisions=enabled_precisions_set,
    189         **kwargs,
    190     )
    191     return compiled_ts_module
    192 elif target_ir == _IRType.fx:

File /usr/local/lib/python3.10/dist-packages/torch_tensorrt/ts/_compiler.py:151, in compile(module, inputs, input_signature, device, disable_tf32, sparse_weights, enabled_precisions, refit, debug, capability, num_avg_timing_iters, workspace_size, dla_sram_size, dla_local_dram_size, dla_global_dram_size, calibrator, truncate_long_and_double, require_full_compilation, min_block_size, torch_executed_ops, torch_executed_modules, allow_shape_tensors)
    124     raise ValueError(
    125         f"require_full_compilation is enabled however the list of modules and ops to run in torch is not empty. Found: torch_executed_ops: {torch_executed_ops}, torch_executed_modules: {torch_executed_modules}"
    126     )
    128 spec = {
    129     "inputs": input_list,
    130     "input_signature": input_signature,
   (...)
    148     "allow_shape_tensors": allow_shape_tensors,
    149 }
--> 151 compiled_cpp_mod = _C.compile_graph(module._c, _parse_compile_spec(spec))
    152 compiled_module: torch.jit.ScriptModule = torch.jit._recursive.wrap_cpp_module(
    153     compiled_cpp_mod
    154 )
    155 return compiled_module

RuntimeError: [Error thrown at core/conversion/conversionctx/ConversionCtx.cpp:169] Building serialized network failed in TensorRT

To Reproduce

Steps to reproduce the behavior:

  1. docker run -it --gpus all nvcr.io/nvidia/pytorch:24.03-py3 bash
  2. jupyter notebook
  3. sequentially run cells in qat-ptq-workflow.ipynb

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

narendasan commented 3 months ago

This method has been deprecated in favor of one based on the new TRT quantization toolkit, you can get more info here: https://github.com/pytorch/TensorRT/blob/release/2.3/examples/dynamo/vgg16_fp8_ptq.py