Open johnzlli opened 8 months ago
Hello - since this model is traceable and doesn't appear to have graph breaks, I think ir="dynamo"
can generally give a small boost over ir="torch_compile"
. Additionally, there is an optimization_level
parameter for which the maximum is 5
. I have added an adapted example below which could help boost performance:
x = data.half().cuda()
m = model.half().cuda()
torch._dynamo.reset()
opt_model = torch_tensorrt.compile(m, ir="dynamo", inputs=[x], enabled_precisions={torch.half}, optimization_level=5)
print(f"trt_dynamo fp16 time: {run_model(x, opt_model)}")
Additionally, if you share the output logs of a (separate) run with debug=True
, we can see if any operators in the model are unsupported, which can also affect performance.
Hello - since this model is traceable and doesn't appear to have graph breaks, I think
ir="dynamo"
can generally give a small boost overir="torch_compile"
. Additionally, there is anoptimization_level
parameter for which the maximum is5
. I have added an adapted example below which could help boost performance:x = data.half().cuda() m = model.half().cuda() torch._dynamo.reset() opt_model = torch_tensorrt.compile(m, ir="dynamo", inputs=[x], enabled_precisions={torch.half}, optimization_level=5) print(f"trt_dynamo fp16 time: {run_model(x, opt_model)}")
Additionally, if you share the output logs of a (separate) run with
debug=True
, we can see if any operators in the model are unsupported, which can also affect performance.
Thanks for your reply! I take your advice but it seems that ir="dynamo"
and optimization_level=5
get even worse performance than before. And i am sorry, due to the internet access control of the server, i can't share the log file. However, my code is fully displayed above. Perhaps, you can make a copy and run it to try it out.
Thanks for the follow-up. It appears we have full coverage for that model and all of the operators are effectively converted to TRT. I would also suggest using the latest nightly version of Torch-TRT for the most up-to-date performance additions, which can be installed from source or via pip
:
pip install --pre torch torchvision torch_tensorrt --index-url https://download.pytorch.org/whl/nightly/cu121
❓ Question
I am within the nvcr.io/nvidia/pytorch:23.12-py3 container. The performance of torch_tensorrt is wrose than inductor. Details: example code
result
What you have already tried
Environment
conda
,pip
,libtorch
, source):Additional context