Closed collinmccarthy closed 1 year ago
If you enable_profiling
in a TRTModule
it will dump tracing. With the recent update to main, this can be enabled for models using the TorchScript frontend as well.
For Torchscript, its a matter of finding all attributes that are of type __torch__.classes.tensorrt.Engine
and for FX its a matter of finding TRTModule
or if you are using use_experimental_fx_runtime
, TRTModuleNext
. Then it's just calling the enable_profiling
method then running.
For example in TorchScript if you have a graph like the following for a module instance called trt_mod
:
graph(%self_1 : __torch__.Model_trt,
%input_0 : Tensor):
%__torch___Model_trt_engine_ : __torch__.torch.classes.tensorrt.Engine = prim::GetAttr[name="__torch___Model_trt_engine_"](%self_1)
%3 : Tensor[] = prim::ListConstruct(%input_0)
%4 : Tensor[] = tensorrt::execute_engine(%3, %__torch___Model_trt_engine_)
%5 : Tensor = prim::ListUnpack(%4)
return (%5)
You can do trt_mod.__torch__Model_trt_engine.enable_profiling()
and run the module.
The current standard FX TRTModule will print out layer timings when you enable profiling, the experimental runtime and TorchScript will save the layer timings to a JSON file you can visualize with perfetto as well as print them out.
Thank you @narendasan this is very helpful.
If you enable_profiling in a TRTModule it will dump tracing. With the recent update to main, this can be enabled for models using the TorchScript frontend as well.
Can you point me to something that would show how to do this for the TorchScript front end? For my INT8 models I'm using PTQ like:
trt_model = torch_tensorrt.compile(
ts_model,
inputs=[images],
enabled_precisions={torch.half},
device={
"device_type": torch_tensorrt.DeviceType.GPU,
"gpu_id": 0,
"dla_core": 0,
"allow_gpu_fallback": False,
"disable_tf32": False
},
debug=debug
)
I would love to profile this without having to figure out the tensorrt.Engine
attribute in a hacky way (so it works for any model).
Also, is there a way to return a dictionary with the timings rather than dump them to a file? Or if not, is there a way to specify / get the JSON filepath it's being dumped to? I don't see any JSON file getting output to.
For profiling with the TS front end, I see TRTModuleNext
(is this the unified front end?) but I don't see how to get the engine from the PTQ workflow without searching for the attribute name in the graph string representation.
Is there a way to use PTQ with FX via the new unified front end, so I could stick to FX for everything (including profiling)? The FX profiler is great, and easy to customize! I just need to be able to do anything/everything that I could do with torch_tensorrt.compile()
via TRTInterpreter
instead.
Thank you @narendasan this is very helpful.
If you enable_profiling in a TRTModule it will dump tracing. With the recent update to main, this can be enabled for models using the TorchScript frontend as well.
Can you point me to something that would show how to do this for the TorchScript front end? For my INT8 models I'm using PTQ like:
trt_model = torch_tensorrt.compile( ts_model, inputs=[images], enabled_precisions={torch.half}, device={ "device_type": torch_tensorrt.DeviceType.GPU, "gpu_id": 0, "dla_core": 0, "allow_gpu_fallback": False, "disable_tf32": False }, debug=debug )
I would love to profile this without having to figure out the
tensorrt.Engine
attribute in a hacky way (so it works for any model).
Right now there isn't a way to tell the runtime to enable profiling at compile time since they are mostly completely separate. For the first version of this unfortunately the "hacky" way, looking for attributes is what we have. Could totally see us adding a context manager where you could do something like
with torch_tensorrt.execution_profiling_enabled(module=trt_mod, profile_path="x"):
output = trt_mod(input)
where the manager would go through the module for you and enable profiling.
Also, is there a way to return a dictionary with the timings rather than dump them to a file? Or if not, is there a way to specify / get the JSON filepath it's being dumped to? I don't see any JSON file getting output to.
Right now, its only the json trace format that is supported (engine layer execution timings can also be dumped to console with the appropriate logging level - INFO) . Location of these traces can be set as a field of the torch.classes.tensorrt.Engine
object (e.g. trt_mod.__torch__Model_trt_engine.profile_path_prefix = "<ABSOLUTE PATH TO DIR>
).
For profiling with the TS front end, I see
TRTModuleNext
(is this the unified front end?) but I don't see how to get the engine from the PTQ workflow without searching for the attribute name in the graph string representation.Is there a way to use PTQ with FX via the new unified front end, so I could stick to FX for everything (including profiling)? The FX profiler is great, and easy to customize! I just need to be able to do anything/everything that I could do with
torch_tensorrt.compile()
viaTRTInterpreter
instead.
TRTModuleNext will be the class of module returned to you if you use the use_experimental_fx_runtime
feature which uses the unified runtime under the hood.
Don't think FX exposes a place for your calibrator right now but it is something that is easy to hack in (just havent had time to do it properly yet / not sure how it would work for split graphs). But the DataLoaderCalibrator shipped in torch_tensorrt.ptq is full compatible the TRT python API so you should be able to just set it as part of the builder in TRTInterpreter
Right now there isn't a way to tell the runtime to enable profiling at compile time since they are mostly completely separate. For the first version of this unfortunately the "hacky" way, looking for attributes is what we have. Could totally see us adding a context manager where you could do something like
with torch_tensorrt.execution_profiling_enabled(module=trt_mod, profile_path="x"): output = trt_mod(input)
where the manager would go through the module for you and enable profiling.
That would be awesome. For now I can make the "hacky" way work.
Also, is there a way to return a dictionary with the timings rather than dump them to a file? Or if not, is there a way to specify / get the JSON filepath it's being dumped to? I don't see any JSON file getting output to.
Right now, its only the json trace format that is supported (engine layer execution timings can also be dumped to console with the appropriate logging level - INFO) . Location of these traces can be set as a field of the
torch.classes.tensorrt.Engine
object (e.g.trt_mod.__torch__Model_trt_engine.profile_path_prefix = "<ABSOLUTE PATH TO DIR>
).
Sounds good.
TRTModuleNext will be the class of module returned to you if you use the
use_experimental_fx_runtime
feature which uses the unified runtime under the hood.
Got it, I'll try this out and see how my results differ for FX / Experimental FX / TS and open a separate question for this if I'm confused or the results differ significantly. As of right now I think #1452 will cause FX and maybe the experimental FX to give worse results.
Don't think FX exposes a place for your calibrator right now but it is something that is easy to hack in (just havent had time to do it properly yet / not sure how it would work for split graphs). But the DataLoaderCalibrator shipped in torch_tensorrt.ptq is full compatible the TRT python API so you should be able to just set it as part of the builder in TRTInterpreter
Okay, I dug into the code a bit more and I get it now. Once #1452 is resolved I'll see if I can make this work.
I really appreciate all the help here, thank you so much. If I don't have any more profiling-related questions in the next week or so I'll close the issue.
This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days
❓ Question
When I'm not using TensorRT, I run my model through an FX interpreter that times each call op (by inserting CUDA events before/after and measuring the elapsed time). I'd like to do something similar after converting/compiling the model to TensorRT, and I see there is some profiling built in with tensorrt.Proflier but its usage isn't clear to me.
Is there an example anywhere on how to time each layer or op with this profiler, or any other means of profiling the TensorRT engine/layers? I don't mind messing with the op converters to do so, but I don't want to have to wrap every op converter my model uses. More generally I think I could use the PyTorch profiler but it would be difficult to parse the output to get clear per-layer/per-op results.