pytorch / TensorRT

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
https://pytorch.org/TensorRT
BSD 3-Clause "New" or "Revised" License
2.54k stars 349 forks source link

❓ [Question] Sometimes inference time is too slow.. #765

Closed socome closed 2 years ago

socome commented 2 years ago

❓ Question

Thank you for this nice project, I successfully converted my model, which feeds multispectral images, using Torch-TensorRT as below.

    model = torch.load(model_path)['model']
    model = model.to(device)
    model.eval()

    scripted_model = torch.jit.script(model)

    # For static size shape=[1, 3, 224, 224]

    compile_settings = {
        "inputs": [torch_tensorrt.Input(
            min_shape=[1, 3, 512, 640],
            opt_shape=[1, 3, 512, 640],
            max_shape=[1, 3, 512, 640],
            dtype=torch.half),
            torch_tensorrt.Input(
            min_shape=[1, 1, 512, 640],
            opt_shape=[1, 1, 512, 640],
            max_shape=[1, 1, 512, 640],
            dtype=torch.half
        )],
        "enabled_precisions": {torch.half}  # Run with FP16
    }

    trt_ts_module = torch_tensorrt.ts.compile(scripted_model, **compile_settings)

    fake_vis_fp16 = torch.ones((1, 3, 512, 640)).half().cuda()
    fake_lwir_fp16 = torch.ones((1, 1, 512, 640)).half().cuda()

    fake_vis_fp32 = torch.ones((1, 3, 512, 640)).float().cuda()
    fake_lwir_fp32 = torch.ones((1, 1, 512, 640)).float().cuda()

    torch.jit.save(trt_ts_module, "MLPD_trt_torchscript_module.ts") # save the TRT embedded Torchscript

Then, I tested the inference time of the model. I found that sometimes it is too slow as below.

image

How can i solve this problem..? Performance(Miss-rate) of converted model is the same as performance of original model.

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

Additional context

dytan commented 2 years ago

Thank you guys, I encountered the same situation too. The model accepts a image and output something like segmentation mask. When test the inference time use a loop with same input, performance on some frames are much slow and this occurs regularly. The python lib is built from the master branch serveral days ago follow the dockerfile build instructions based on pytorch:21.10, tested using python call with fp16 in a t4 gpu. image

narendasan commented 2 years ago

Make sure you are benchmarking with the proper settings. Things like synchronizes and using the cudnn benchmark settings as as well as determinism effect your results. Here is the script we maintain for benchmarking models https://github.com/NVIDIA/Torch-TensorRT/blob/master/examples/benchmark/py/perf_run.py

github-actions[bot] commented 2 years ago

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days