❓ [Question] Sometimes inference time is too slow..

socome commented 2 years ago

❓ Question

Thank you for this nice project, I successfully converted my model, which feeds multispectral images, using Torch-TensorRT as below.

    model = torch.load(model_path)['model']
    model = model.to(device)
    model.eval()

    scripted_model = torch.jit.script(model)

    # For static size shape=[1, 3, 224, 224]

    compile_settings = {
        "inputs": [torch_tensorrt.Input(
            min_shape=[1, 3, 512, 640],
            opt_shape=[1, 3, 512, 640],
            max_shape=[1, 3, 512, 640],
            dtype=torch.half),
            torch_tensorrt.Input(
            min_shape=[1, 1, 512, 640],
            opt_shape=[1, 1, 512, 640],
            max_shape=[1, 1, 512, 640],
            dtype=torch.half
        )],
        "enabled_precisions": {torch.half}  # Run with FP16
    }

    trt_ts_module = torch_tensorrt.ts.compile(scripted_model, **compile_settings)

    fake_vis_fp16 = torch.ones((1, 3, 512, 640)).half().cuda()
    fake_lwir_fp16 = torch.ones((1, 1, 512, 640)).half().cuda()

    fake_vis_fp32 = torch.ones((1, 3, 512, 640)).float().cuda()
    fake_lwir_fp32 = torch.ones((1, 1, 512, 640)).float().cuda()

    torch.jit.save(trt_ts_module, "MLPD_trt_torchscript_module.ts") # save the TRT embedded Torchscript

Then, I tested the inference time of the model. I found that sometimes it is too slow as below.

How can i solve this problem..? Performance(Miss-rate) of converted model is the same as performance of original model.

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

PyTorch Version (e.g., 1.0): 3.7
OS (e.g., Linux): Linux
How you installed PyTorch (conda, pip, libtorch, source): conda
Python version: 3.8.12
CPU Architecture:
CUDA version: 11.4
GPU models and configuration: 2080Ti
Any other relevant information: I used docker image

Additional context

dytan commented 2 years ago

Thank you guys, I encountered the same situation too. The model accepts a image and output something like segmentation mask. When test the inference time use a loop with same input, performance on some frames are much slow and this occurs regularly. The python lib is built from the master branch serveral days ago follow the dockerfile build instructions based on pytorch:21.10, tested using python call with fp16 in a t4 gpu.

narendasan commented 2 years ago

Make sure you are benchmarking with the proper settings. Things like synchronizes and using the cudnn benchmark settings as as well as determinism effect your results. Here is the script we maintain for benchmarking models https://github.com/NVIDIA/Torch-TensorRT/blob/master/examples/benchmark/py/perf_run.py

github-actions[bot] commented 2 years ago

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

pytorch / TensorRT