grimoire / torch2trt_dynamic

A pytorch to tensorrt convert with dynamic shape support
MIT License
254 stars 34 forks source link

Consider adding support for DLA (Deep learning accelerator) modules #26

Open tehkillerbee opened 2 years ago

tehkillerbee commented 2 years ago

Some platforms including Jetson Xavier AGX and NX supports DLA modules. However, when converting the pytorch module to tensorrt, it will never attempt to use DLA cores but will always use the GPU. This is apparent from the TensorRT log output:

[TensorRT] INFO: 
[TensorRT] INFO: --------------- Layers running on DLA: 
[TensorRT] INFO: 
[TensorRT] INFO: --------------- Layers running on GPU: 
[TensorRT] INFO: (Unnamed Layer* 26) [Convolution] + (Unnamed Layer* 28) [Activation], 
...

The DLA cores must be enabled by changing the IBuilderConfig https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/infer/Core/NetworkConfig.html#tensorrt.IBuilderConfig

A good example on how to do this is described by jkjung-avt in this issue https://github.com/jkjung-avt/tensorrt_demos/issues/463

https://github.com/jkjung-avt/tensorrt_demos/blob/f53b5ae9b004489463a407d8e9b230f39230d051/yolo/onnx_to_tensorrt.py#L165-L170

grimoire commented 2 years ago

That sounds like a good idea! I will try. But that would take a long time since I need to find where did I hide my jetson nano...

tehkillerbee commented 2 years ago

@grimoire I think the DLA cores are only supported in the Jetson Xavier NX, AGX series and more recent GPUs listed here

I have been playing around with it today by adding the following lines to here

# set DLA_core enabled if supported
config.default_device_type = trt.DeviceType.DLA
config.DLA_core = 0
config.set_flag(trt.BuilderFlag.GPU_FALLBACK)
config.set_flag(trt.BuilderFlag.STRICT_TYPES)

After doing this, I can see that some layers are now running on the DLA while some layers are incompatible. I also see a large number of warning such as the one below. But this is odd, since I am already using FP16 mode...

[TensorRT] WARNING: DLA only supports FP16 and Int8 precision type. Switching (Unnamed Layer* 1302) [Shape] device type to GPU.

Unfortunately, the export process never finishes but crashes instead..! I suspect it may be caused by the older JetPack version I am currently running, as I see there were some DLA support added in the more recent version of JetPack.

python: ../rtSafe/cuda/cudaReformatRunner.cpp:37: nvinfer1::rt::cuda::ReformatRunner::ReformatRunner(nvinfer1::rt::DefaultRunnerParameters, const ReformatParameters&): Assertion `matchValidDims(defaultParams.inputs[0].extent, defaultParams.outputs[0].extent)' failed.
Aborted (core dumped)

I'll keep you updated with my progress! :)

grimoire commented 2 years ago

Cool! I will try to find a device that supports DLA. If you find a way to add this, please share it with me. And of cause a PR is welcome!