Closed SeTriones closed 2 years ago
TensorRT performs "kernel auto-tuning" which essentially selects the fastest kernels for your models on your specific device. There can be a small amount of jitter in this step, for a variety of reasons, leading to different kernels being selected & thus different perf.
You can check the kernels are in fact different to confirm this.
Also, this looks like ~1.5% perf jitter for your model. Is this an issue in your application, or is this just out of curiosity? Have you seen larger variance between runs?
@ncomly-nvidia this is just for curiosity. I'm doing more experiments on the following model architectures:
efficientnet-b2 vit yolov5s(v6.0) yolov5m(v6.0) yolov5x(v6.0) tsm( batch 16) SwinTransformer3D bert transformer
This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days
Hi @SeTriones, how have your other experiments gone? Is there other discrepancies in results or performance in the models you listed above which create concern for you?
This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days
❓ Question
I'm trying to run a pretrained resnet50 model from torch.torchvision.models. enabled_precisions is set to torch.half. Each time I load the same resnet50 torchscript, using the same input(which is set to zero using np.zeros). But after running serveral times I've found the output is not stable.
What you have already tried
I've tried two ways:
I wonder whether there's some random behaviors in
torch_tensorrt.compile()
when enabled_precisions is set to torch.half.Environment
conda
,pip
,libtorch
, source): pipAdditional context
The python code producing unstable result is as below:
Two iterations produce different outputs: Iteration 1:
Iteration 2: