Tensorrt fp32 to int8 convertion error on model that has 2 inputs

Description

I am trying to run a transformer model on tensorrt. To be more specific my model is vitb_256_mae based. I already converted it into onnx file format. I am trying to run it on tensorrt runtime. So in order to do that i made a python script that creates an engine for fp32 to fp16 convertion. It works just fine for this situration. After that in order to convert fp32 to int8, i used your snippet. But i got stuck with some cuda driver errors. My model takes template and search as input, and produce 5 outputs.

Important changes

In order to create optimization profiles for multiple inputs i tweaked your onnx_to_tensorrt.py a little bit:

def create_optimization_profiles(builder, inputs, batch_sizes=[1,8,16,32,64]): 
    # Check if all inputs are fixed explicit batch to create a single profile and avoid duplicates
    if all([inp.shape[0] > -1 for inp in inputs]):
        profile = builder.create_optimization_profile()
        for inp in inputs:
            fbs, shape = inp.shape[0], inp.shape[1:]
            profile.set_shape(inp.name, min=(fbs, *shape), opt=(fbs, *shape), max=(fbs, *shape))
        #print(profile.get_shape("template"))
        #print(profile.get_shape("search"))
        return [profile]

Environment

TensorRT Version: 8.6.1 (from pip not builded from source) GPU Type: RTX3090 ti Nvidia Driver Version: nvidia-driver-530 CUDA Version: 12.1 CUDNN Version: dont have Operating System + Version: ubuntu 22.04.2 lts Python Version (if applicable): 3.10.12

More info

I tryed int8 conversion with 1 dynamic input using this link: https://github.com/NVIDIA/TensorRT/issues/289 for alexnet code worked. Thats why i think the error might because of the number of input. Or i might have to make some changes at imagenetcalibrator file.

Thanks for reading good days !

Output errors

2023-08-10 11:27:20 - main - INFO - TRT_LOGGER Verbosity: Severity.ERROR [08/10/2023-11:27:23] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading /home/pc-3730/fatihcan/OSTRACK_TENSORRT/tensorrt-utils-20.01/classification/imagenet/onnx_to_tensorrt.py:177: DeprecationWarning: Use set_memory_pool_limit instead. config.max_workspace_size = 4**30 # 1GiB 2023-08-10 11:27:23 - main - INFO - Setting BuilderFlag.FP16 2023-08-10 11:27:23 - main - INFO - Setting BuilderFlag.INT8 2023-08-10 11:27:23 - ImagenetCalibrator - INFO - Collecting calibration files from: /home/pc-3730/fatihcan/OSTRACK_TENSORRT/tensorrt-utils-20.01/classification/imagenet/imagenet/val 2023-08-10 11:27:23 - ImagenetCalibrator - INFO - Number of Calibration Files found: 10869 2023-08-10 11:27:23 - ImagenetCalibrator - WARNING - Capping number of calibration images to max_calibration_size: 512 [08/10/2023-11:27:23] [TRT] [W] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [08/10/2023-11:27:23] [TRT] [W] onnx2trt_utils.cpp:400: One or more weights outside the range of INT32 was clamped [08/10/2023-11:27:23] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output. 2023-08-10 11:27:23 - main - DEBUG - === Network Description === 2023-08-10 11:27:23 - main - DEBUG - Input 0 | Name: template | Shape: (1, 3, 128, 128) 2023-08-10 11:27:23 - main - DEBUG - Input 1 | Name: search | Shape: (1, 3, 256, 256) 2023-08-10 11:27:23 - main - DEBUG - Output 0 | Name: pred_boxes | Shape: (1, 1, 4) 2023-08-10 11:27:23 - main - DEBUG - Output 1 | Name: score_map | Shape: (1, 1, 16, 16) 2023-08-10 11:27:23 - main - DEBUG - Output 2 | Name: size_map | Shape: (1, 2, 16, 16) 2023-08-10 11:27:23 - main - DEBUG - Output 3 | Name: offset_map | Shape: (1, 2, 16, 16) 2023-08-10 11:27:23 - main - DEBUG - Output 4 | Name: backbone_feat | Shape: (1, 320, 768) 2023-08-10 11:27:23 - main - DEBUG - === Optimization Profiles === 2023-08-10 11:27:23 - main - DEBUG - template - OptProfile 0 - Min (1, 3, 128, 128) Opt (1, 3, 128, 128) Max (1, 3, 128, 128) 2023-08-10 11:27:23 - main - DEBUG - search - OptProfile 0 - Min (1, 3, 256, 256) Opt (1, 3, 256, 256) Max (1, 3, 256, 256) 2023-08-10 11:27:23 - main - INFO - Building Engine... /home/pc-3730/fatihcan/OSTRACK_TENSORRT/tensorrt-utils-20.01/classification/imagenet/onnx_to_tensorrt.py:222: DeprecationWarning: Use build_serialized_network instead. with builder.build_engine(network, config) as engine, open(args.output, "wb") as f: [08/10/2023-11:27:26] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading 2023-08-10 11:27:27 - ImagenetCalibrator - INFO - Calibration images pre-processed: 32/512 [08/10/2023-11:27:27] [TRT] [E] 2: [calibrator.cu::absTensorMax::141] Error Code 2: Internal Error (Assertion memory != nullptr failed. memory must be valid if nbElem != 0) [08/10/2023-11:27:27] [TRT] [E] 1: [executionContext.cpp::executeInternal::1177] Error Code 1: Cuda Runtime (an illegal memory access was encountered) [08/10/2023-11:27:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered) [08/10/2023-11:27:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered) [08/10/2023-11:27:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered) [08/10/2023-11:27:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered) [08/10/2023-11:27:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered) [08/10/2023-11:27:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered) [08/10/2023-11:27:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered) [08/10/2023-11:27:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered) [08/10/2023-11:27:27] [TRT] [E] 3: [engine.cpp::~Engine::298] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/engine.cpp::~Engine::298, condition: mExecutionContextCounter.use_count() == 1. Destroying an engine object before destroying the IExecutionContext objects it created leads to undefined behavior. ) [08/10/2023-11:27:27] [TRT] [E] 1: [cudaDriverHelpers.cpp::operator()::94] Error Code 1: Cuda Driver (an illegal memory access was encountered) [08/10/2023-11:27:27] [TRT] [E] 1: [cudaDriverHelpers.cpp::operator()::94] Error Code 1: Cuda Driver (an illegal memory access was encountered) [08/10/2023-11:27:27] [TRT] [E] 1: [cudaDriverHelpers.cpp::operator()::94] Error Code 1: Cuda Driver (an illegal memory access was encountered) [08/10/2023-11:27:27] [TRT] [E] 1: [cudaDriverHelpers.cpp::operator()::94] Error Code 1: Cuda Driver (an illegal memory access was encountered) [08/10/2023-11:27:27] [TRT] [E] 1: [cudaDriverHelpers.cpp::operator()::94] Error Code 1: Cuda Driver (an illegal memory access was encountered) [08/10/2023-11:27:27] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaStream::47] Error Code 1: Cuda Runtime (an illegal memory access was encountered) [08/10/2023-11:27:27] [TRT] [E] 2: [calibrator.cpp::calibrateEngine::1181] Error Code 2: Internal Error (Assertion context->executeV2(&bindings[0]) failed. ) Traceback (most recent call last): File "/home/pc-3730/fatihcan/OSTRACK_TENSORRT/tensorrt-utils-20.01/classification/imagenet/onnx_to_tensorrt.py", line 227, in main() File "/home/pc-3730/fatihcan/OSTRACK_TENSORRT/tensorrt-utils-20.01/classification/imagenet/onnx_to_tensorrt.py", line 222, in main with builder.build_engine(network, config) as engine, open(args.output, "wb") as f: AttributeError: enter PyCUDA WARNING: a clean-up operation failed (dead context maybe?) cuMemFree failed: an illegal memory access was encountered

rmccorm4 / tensorrt-utils