marcoslucianops / DeepStream-Yolo

NVIDIA DeepStream SDK 7.0 / 6.4 / 6.3 / 6.2 / 6.1.1 / 6.1 / 6.0.1 / 6.0 / 5.1 implementation for YOLO models
MIT License
1.48k stars 357 forks source link

How to use DLA to build engine #27

Closed c7934597 closed 2 years ago

c7934597 commented 3 years ago

https://forums.developer.nvidia.com/t/how-to-use-dla-in-deepstream-yolov5/161550/25

Hi @marcoslucianops , I used deepstream-yolov4, and I check out the engine. That been build on GPU. I saw the article before. How do I fix following program for building DLA engine?

https://github.com/marcoslucianops/DeepStream-Yolo/blob/470ed82658a5546b55185b3223f8057ecf54cf88/native/nvdsinfer_custom_impl_Yolo/yolo.cpp#L74-L81

marcoslucianops commented 3 years ago

Hi, edit nvinfer1::ICudaEngine *Yolo::createEngine (nvinfer1::IBuilder* builder) function (lines 61-90), in yolo.cpp file, to:

nvinfer1::ICudaEngine *Yolo::createEngine (nvinfer1::IBuilder* builder)
{
    assert (builder);

    if (m_DeviceType == "kDLA") {
        builder->setDefaultDeviceType(nvinfer1::DeviceType::kDLA);
    }

    std::vector<float> weights = loadWeights(m_WtsFilePath, m_NetworkType);
    std::vector<nvinfer1::Weights> trtWeights;

    nvinfer1::INetworkDefinition *network = builder->createNetwork();
    if (parseModel(*network) != NVDSINFER_SUCCESS) {
        network->destroy();
        return nullptr;
    }

    // Build the engine
    std::cout << "Building the TensorRT Engine" << std::endl;
    nvinfer1::ICudaEngine * engine = builder->buildCudaEngine(*network);
    if (engine) {
        std::cout << "Building complete\n" << std::endl;
    } else {
        std::cerr << "Building engine failed\n" << std::endl;
    }

    // destroy
    network->destroy();
    return engine;
}

and add these lines in config_infer_primary.txt file (in [property] section):

enable-dla=1
use-dla-core=0

Note: edit these lines according to:

I don't have Xavier board to test. Please tell me if it works.

c7934597 commented 3 years ago

It can run on Xavier, but seems some layer convert to DLA unsuccessfully.

I use tegrastats and jtop for seeing effects. The effect is as same as original.


Building YOLO network complete Building the TensorRT Engine ERROR: [TRT]: leaky_1: ActivationLayer (with ActivationType = LEAKY_RELU) not supported for DLA. WARNING: [TRT]: Default DLA is enabled but layer leaky_1 is not supported on DLA, falling back to GPU. ERROR: [TRT]: leaky_2: ActivationLayer (with ActivationType = LEAKY_RELU) not supported for DLA. WARNING: [TRT]: Default DLA is enabled but layer leaky_2 is not supported on DLA, falling back to GPU. ERROR: [TRT]: leaky_3: ActivationLayer (with ActivationType = LEAKY_RELU) not supported for DLA. WARNING: [TRT]: Default DLA is enabled but layer leaky_3 is not supported on DLA, falling back to GPU. WARNING: [TRT]: route_3: Concatenation on DLA requires at least two inputs. WARNING: [TRT]: Default DLA is enabled but layer route_3 is not supported on DLA, falling back to GPU. WARNING: [TRT]: Default DLA is enabled but layer (Unnamed Layer 10) [Slice] is not supported on DLA, falling back to GPU. ERROR: [TRT]: leaky_5: ActivationLayer (with ActivationType = LEAKY_RELU) not supported for DLA. WARNING: [TRT]: Default DLA is enabled but layer leaky_5 is not supported on DLA, falling back to GPU. ERROR: [TRT]: leaky_6: ActivationLayer (with ActivationType = LEAKY_RELU) not supported for DLA. WARNING: [TRT]: Default DLA is enabled but layer leaky_6 is not supported on DLA, falling back to GPU. ERROR: [TRT]: leaky_8: ActivationLayer (with ActivationType = LEAKY_RELU) not supported for DLA. WARNING: [TRT]: Default DLA is enabled but layer leaky_8 is not supported on DLA, falling back to GPU. ERROR: [TRT]: leaky_11: ActivationLayer (with ActivationType = LEAKY_RELU) not supported for DLA. WARNING: [TRT]: Default DLA is enabled but layer leaky_11 is not supported on DLA, falling back to GPU. WARNING: [TRT]: route_11: Concatenation on DLA requires at least two inputs. WARNING: [TRT]: Default DLA is enabled but layer route_11 is not supported on DLA, falling back to GPU. WARNING: [TRT]: Default DLA is enabled but layer (Unnamed Layer 27) [Slice] is not supported on DLA, falling back to GPU. ERROR: [TRT]: leaky_13: ActivationLayer (with ActivationType = LEAKY_RELU) not supported for DLA. WARNING: [TRT]: Default DLA is enabled but layer leaky_13 is not supported on DLA, falling back to GPU. ERROR: [TRT]: leaky_14: ActivationLayer (with ActivationType = LEAKY_RELU) not supported for DLA. WARNING: [TRT]: Default DLA is enabled but layer leaky_14 is not supported on DLA, falling back to GPU. ERROR: [TRT]: leaky_16: ActivationLayer (with ActivationType = LEAKY_RELU) not supported for DLA. WARNING: [TRT]: Default DLA is enabled but layer leaky_16 is not supported on DLA, falling back to GPU. ERROR: [TRT]: leaky_19: ActivationLayer (with ActivationType = LEAKY_RELU) not supported for DLA. WARNING: [TRT]: Default DLA is enabled but layer leaky_19 is not supported on DLA, falling back to GPU. WARNING: [TRT]: route_19: Concatenation on DLA requires at least two inputs. WARNING: [TRT]: Default DLA is enabled but layer route_19 is not supported on DLA, falling back to GPU. WARNING: [TRT]: Default DLA is enabled but layer (Unnamed Layer 44) [Slice] is not supported on DLA, falling back to GPU. ERROR: [TRT]: leaky_21: ActivationLayer (with ActivationType = LEAKY_RELU) not supported for DLA. WARNING: [TRT]: Default DLA is enabled but layer leaky_21 is not supported on DLA, falling back to GPU. ERROR: [TRT]: leaky_22: ActivationLayer (with ActivationType = LEAKY_RELU) not supported for DLA. WARNING: [TRT]: Default DLA is enabled but layer leaky_22 is not supported on DLA, falling back to GPU. ERROR: [TRT]: leaky_24: ActivationLayer (with ActivationType = LEAKY_RELU) not supported for DLA. WARNING: [TRT]: Default DLA is enabled but layer leaky_24 is not supported on DLA, falling back to GPU. ERROR: [TRT]: leaky_27: ActivationLayer (with ActivationType = LEAKY_RELU) not supported for DLA. WARNING: [TRT]: Default DLA is enabled but layer leaky_27 is not supported on DLA, falling back to GPU. ERROR: [TRT]: leaky_28: ActivationLayer (with ActivationType = LEAKY_RELU) not supported for DLA. WARNING: [TRT]: Default DLA is enabled but layer leaky_28 is not supported on DLA, falling back to GPU. ERROR: [TRT]: leaky_29: ActivationLayer (with ActivationType = LEAKY_RELU) not supported for DLA. WARNING: [TRT]: Default DLA is enabled but layer leaky_29 is not supported on DLA, falling back to GPU. WARNING: [TRT]: Default DLA is enabled but layer yolo_31 is not supported on DLA, falling back to GPU. WARNING: [TRT]: route_31: Concatenation on DLA requires at least two inputs. WARNING: [TRT]: Default DLA is enabled but layer route_31 is not supported on DLA, falling back to GPU. ERROR: [TRT]: leaky_33: ActivationLayer (with ActivationType = LEAKY_RELU) not supported for DLA. WARNING: [TRT]: Default DLA is enabled but layer leaky_33 is not supported on DLA, falling back to GPU. WARNING: [TRT]: Default DLA is enabled but layer preMul_33 is not supported on DLA, falling back to GPU. WARNING: [TRT]: Default DLA is enabled but layer postMul_33 is not supported on DLA, falling back to GPU. WARNING: [TRT]: Default DLA is enabled but layer mm1_33 is not supported on DLA, falling back to GPU. WARNING: [TRT]: Default DLA is enabled but layer mm2_33 is not supported on DLA, falling back to GPU. ERROR: [TRT]: leaky_36: ActivationLayer (with ActivationType = LEAKY_RELU) not supported for DLA. WARNING: [TRT]: Default DLA is enabled but layer leaky_36 is not supported on DLA, falling back to GPU. WARNING: [TRT]: Default DLA is enabled but layer yolo_38 is not supported on DLA, falling back to GPU. INFO: [TRT]: mm1_33: broadcasting input0 to make tensors conform, dims(input0)=[1,26,13][NONE] dims(input1)=[128,13,13][NONE]. INFO: [TRT]: mm2_33: broadcasting input1 to make tensors conform, dims(input0)=[128,26,13][NONE] dims(input1)=[1,13,26][NONE]. WARNING: [TRT]: DLA supports only 8 subgraphs per DLA core. Switching to GPU for layer conv_6 WARNING: [TRT]: DLA supports only 8 subgraphs per DLA core. Switching to GPU for layer conv_8 WARNING: [TRT]: DLA supports only 8 subgraphs per DLA core. Switching to GPU for layer conv_13 WARNING: [TRT]: DLA supports only 8 subgraphs per DLA core. Switching to GPU for layer conv_14 WARNING: [TRT]: DLA supports only 8 subgraphs per DLA core. Switching to GPU for layer conv_16 WARNING: [TRT]: DLA supports only 8 subgraphs per DLA core. Switching to GPU for layer conv_21 WARNING: [TRT]: DLA supports only 8 subgraphs per DLA core. Switching to GPU for layer conv_22 WARNING: [TRT]: DLA supports only 8 subgraphs per DLA core. Switching to GPU for layer conv_24 WARNING: [TRT]: DLA supports only 8 subgraphs per DLA core. Switching to GPU for layer conv_28 WARNING: [TRT]: DLA supports only 8 subgraphs per DLA core. Switching to GPU for layer conv_36 WARNING: [TRT]: DLA supports only 8 subgraphs per DLA core. Switching to GPU for layer conv_30 WARNING: [TRT]: DLA supports only 8 subgraphs per DLA core. Switching to GPU for layer conv_37 INFO: [TRT]: INFO: [TRT]: --------------- Layers running on DLA: INFO: [TRT]: {conv_1,batch_norm_1}, {conv_2,batch_norm_2}, {conv_3,batch_norm_3}, {conv_5,batch_norm_5}, {maxpool_10,conv_11,batch_norm_11}, {maxpool_18,conv_19,batch_norm_19}, {maxpool_26,conv_27,batch_norm_27}, {conv_29,batch_norm_29,conv_33,batch_norm_33}, INFO: [TRT]: --------------- Layers running on GPU: INFO: [TRT]: preMul_33, postMul_33, leaky_1, leaky_2, leaky_3, (Unnamed Layer 10) [Slice], leaky_5, conv_6, leaky_6, conv_8, leaky_8, (Unnamed Layer 8) [Activation]_output copy, leaky_11, (Unnamed Layer 27) [Slice], conv_13, leaky_13, conv_14, leaky_14, conv_16, leaky_16, (Unnamed Layer 25) [Activation]_output copy, leaky_19, (Unnamed Layer 44) [Slice], conv_21, leaky_21, conv_22, leaky_22, conv_24, leaky_24, (Unnamed Layer 42) [Activation]_output copy, leaky_27, conv_28, leaky_28, leaky_29, leaky_33, mm1_33, mm2_33, conv_30, (Unnamed Layer 75) [Matrix Multiply]_output copy, (Unnamed Layer* 54) [Activation]_output copy, conv_36, yolo_31, leaky_36, conv_37, yolo_38, INFO: [TRT]: Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output. INFO: [TRT]: Detected 1 inputs and 2 output network tensors. Building complete

0:02:48.599896399 13265 0x7f20002300 INFO nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger: NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() [UID = 1]: serialize cuda engine to file: /opt/nvidia/deepstream/deepstream-5.0/sources/yolo/model_b1_dla0_fp16.engine successfully INFO: [Implicit Engine Info]: layers num: 3 0 INPUT kFLOAT data 3x416x416
1 OUTPUT kFLOAT yolo_31 24x13x13
2 OUTPUT kFLOAT yolo_38 24x26x26

0:02:48.925054321 13265 0x7f20002300 INFO nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus: [UID 1]: Load new model:/opt/nvidia/deepstream/deepstream-5.0/sources/yolo/config_infer_primary.txt sucessfully

marcoslucianops commented 3 years ago

Some layers can't run in DLA core. It's a TensorRT issue. I think it will be implemented in future releases of TensorRT/DeepStream SDK.

satyajitghana commented 3 years ago

what could be the reason of the following error

trying to run on Xavier NX

ERROR: [TRT]: ../rtExt/dla/native/dlaExecuteRunner.cpp (135) - Assertion Error in updateContextResources: 0 (execParams.dlaCore >= 0 && execParams.dlaCore < core->numEngines())
Building engine failed

Failed to build CUDA engine on yolov4-tiny-3l-1024.cfg
ERROR: Failed to create network using custom network creation function
ERROR: Failed to get cuda engine from custom library API
0:00:06.471895338 18672   0x55c26e8f30 ERROR                nvinfer gstnvinfer.cpp:614:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1735> [UID = 1]: build engine file failed
0:00:06.472071755 18672   0x55c26e8f30 ERROR                nvinfer gstnvinfer.cpp:614:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:1821> [UID = 1]: build backend context failed
0:00:06.472165676 18672   0x55c26e8f30 ERROR                nvinfer gstnvinfer.cpp:614:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::initialize() <nvdsinfer_context_impl.cpp:1148> [UID = 1]: generate backend failed, check config file settings
0:00:06.472758673 18672   0x55c26e8f30 WARN                 nvinfer gstnvinfer.cpp:810:gst_nvinfer_start:<primary_gie> error: Failed to create NvDsInferContext instance
0:00:06.472804529 18672   0x55c26e8f30 WARN                 nvinfer gstnvinfer.cpp:810:gst_nvinfer_start:<primary_gie> error: Config file path: /opt/nvidia/deepstream/deepstream-5.0/sources/yolo/config_infer_primary.txt, NvDsInfer Error: NVDSINFER_CONFIG_FAILED
** ERROR: <main:655>: Failed to set pipeline to PAUSED
Quitting
ERROR from primary_gie: Failed to create NvDsInferContext instance
Debug info: gstnvinfer.cpp(810): gst_nvinfer_start (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInfer:primary_gie:
Config file path: /opt/nvidia/deepstream/deepstream-5.0/sources/yolo/config_infer_primary.txt, NvDsInfer Error: NVDSINFER_CONFIG_FAILED
App run failed
marcoslucianops commented 3 years ago

Did you followed these steps? You need to recompile nvdsinfer_custom_impl_Yolo too.

satyajitghana commented 3 years ago

@marcoslucianops yes !, it seems that for batch-size=1 i am able to use dla, but on batch size >= 2 it doesn't work.

INFO: [TRT]: Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
INFO: [TRT]: Detected 1 inputs and 3 output network tensors.
ERROR: [TRT]: ../builder/cudnnBuilder2.cpp (1757) - Assertion Error in operator(): 0 (et.region->getType() == RegionType::kNVM)
Building engine failed

Failed to build CUDA engine on drone_tiny_3l_1024_test.cfg
ERROR: Failed to create network using custom network creation function
ERROR: Failed to get cuda engine from custom library API
0:03:12.310730296 10901     0x31115e30 ERROR                nvinfer gstnvinfer.cpp:614:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1735> [UID = 1]: build engine file failed
0:03:12.310807833 10901     0x31115e30 ERROR                nvinfer gstnvinfer.cpp:614:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:1821> [UID = 1]: build backend context failed
0:03:12.310911066 10901     0x31115e30 ERROR                nvinfer gstnvinfer.cpp:614:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::initialize() <nvdsinfer_context_impl.cpp:1148> [UID = 1]: generate backend failed, check config file settings
0:03:12.311629282 10901     0x31115e30 WARN                 nvinfer gstnvinfer.cpp:810:gst_nvinfer_start:<primary_gie> error: Failed to create NvDsInferContext instance
0:03:12.311680930 10901     0x31115e30 WARN                 nvinfer gstnvinfer.cpp:810:gst_nvinfer_start:<primary_gie> error: Config file path: /opt/nvidia/deepstream/deepstream-5.0/sources/yolo_60fps_awesome_tracking/config_infer_primary.txt, NvDsInfer Error: NVDSINFER_CONFIG_FAILED
willosonico commented 2 years ago

it this fix updated? because the function prototype

nvinfer1::ICudaEngine Yolo::createEngine (nvinfer1::IBuilder builder)

doesn't match with

nvinfer1::ICudaEngine Yolo::createEngine (nvinfer1::IBuilder builder, nvinfer1::IBuilderConfig* config)

marcoslucianops commented 2 years ago

I don't know if it will work on DLA because I don't have board to test. Can you check please?

willosonico commented 2 years ago

i am checking but it can't compile because the method in the new version has a different prototype, so maybe it's updated and the fix you provided time ago wont' work anymore?

marcoslucianops commented 2 years ago

Can you test only adding these lines in config_infer_primary.txt file (in [property] section):

enable-dla=1
use-dla-core=0
willosonico commented 2 years ago

seems to ignore it, in fact it use model_b2_gpu0_fp16.engine

Deserialize yoloLayer plugin: yolo_93 Deserialize yoloLayer plugin: yolo_96 Deserialize yoloLayer plugin: yolo_99 Running Healthcheck 0:00:07.012733708 22422 0x7f8c82a550 INFO nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger: NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() [UID = 1]: deserialized trt engine from :/home/aaeon/sdcard/fogsphere-edge/fogsphere-engine-py/model_b2_gpu0_fp16.engine INFO: [Implicit Engine Info]: layers num: 4 0 INPUT kFLOAT data 3x640x640
1 OUTPUT kFLOAT yolo_93 255x80x80
2 OUTPUT kFLOAT yolo_96 255x40x40
3 OUTPUT kFLOAT yolo_99 255x20x20

marcoslucianops commented 2 years ago

Can you send the output when the model is building?

willosonico commented 2 years ago

in fact the engine fails building, i deleted the .engine file, and i got, with batch_size = 2 the following output

ERROR: Deserialize engine failed because file path: /home/aaeon/sdcard/fogsphere-edge/fogsphere-engine-py/model_b2_gpu0_fp16.engine open error 0:00:02.280438983 7734 0x7f7882a350 WARN nvinfer gstnvinfer.cpp:635:gst_nvinfer_logger: NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::deserializeEngineAndBackend() [UID = 1]: deserialize engine from file :/home/aaeon/sdcard/fogsphere-edge/fogsphere-engine-py/model_b2_gpu0_fp16.engine failed 0:00:02.280554183 7734 0x7f7882a350 WARN nvinfer gstnvinfer.cpp:635:gst_nvinfer_logger: NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::generateBackendContext() [UID = 1]: deserialize backend context from engine from file :/home/aaeon/sdcard/fogsphere-edge/fogsphere-engine-py/model_b2_gpu0_fp16.engine failed, try rebuild 0:00:02.280615048 7734 0x7f7882a350 INFO nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger: NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() [UID = 1]: Trying to create engine from model files

Loading pre-trained weights Running Healthcheck Loading weights of /home/aaeon/sdcard/fogsphere-edge/fogsphere-engine-py/fogsphere/deepstream/models/yolov5s6/yolov5s complete Total weights read: 7254397 Building YOLO network

  layer                        input               output         weightPtr

(0) conv_silu 3 x 640 x 640 32 x 320 x 320 3584
(1) conv_silu 32 x 320 x 320 64 x 160 x 160 22272
(2) conv_silu 64 x 160 x 160 32 x 160 x 160 24448
(3) route - 64 x 160 x 160 24448
(4) conv_silu 64 x 160 x 160 32 x 160 x 160 26624
(5) conv_silu 32 x 160 x 160 32 x 160 x 160 27776
(6) conv_silu 32 x 160 x 160 32 x 160 x 160 37120
(7) shortcut_linear: 4 - 32 x 160 x 160 -
(8) route - 64 x 160 x 160 37120
(9) conv_silu 64 x 160 x 160 64 x 160 x 160 41472
(10) conv_silu 64 x 160 x 160 128 x 80 x 80 115712 (11) conv_silu 128 x 80 x 80 64 x 80 x 80 124160 (12) route - 128 x 80 x 80 124160 (13) conv_silu 128 x 80 x 80 64 x 80 x 80 132608 (14) conv_silu 64 x 80 x 80 64 x 80 x 80 136960 (15) conv_silu 64 x 80 x 80 64 x 80 x 80 174080 (16) shortcut_linear: 13 - 64 x 80 x 80 -
(17) conv_silu 64 x 80 x 80 64 x 80 x 80 178432 (18) conv_silu 64 x 80 x 80 64 x 80 x 80 215552 (19) shortcut_linear: 16 - 64 x 80 x 80 -
(20) route - 128 x 80 x 80 215552 (21) conv_silu 128 x 80 x 80 128 x 80 x 80 232448 (22) conv_silu 128 x 80 x 80 256 x 40 x 40 528384 (23) conv_silu 256 x 40 x 40 128 x 40 x 40 561664 (24) route - 256 x 40 x 40 561664 (25) conv_silu 256 x 40 x 40 128 x 40 x 40 594944 (26) conv_silu 128 x 40 x 40 128 x 40 x 40 611840 (27) conv_silu 128 x 40 x 40 128 x 40 x 40 759808 (28) shortcut_linear: 25 - 128 x 40 x 40 -
(29) conv_silu 128 x 40 x 40 128 x 40 x 40 776704 (30) conv_silu 128 x 40 x 40 128 x 40 x 40 924672 (31) shortcut_linear: 28 - 128 x 40 x 40 -
(32) conv_silu 128 x 40 x 40 128 x 40 x 40 941568 (33) conv_silu 128 x 40 x 40 128 x 40 x 40 1089536 (34) shortcut_linear: 31 - 128 x 40 x 40 -
(35) route - 256 x 40 x 40 1089536 (36) conv_silu 256 x 40 x 40 256 x 40 x 40 1156096 (37) conv_silu 256 x 40 x 40 512 x 20 x 20 2337792 (38) conv_silu 512 x 20 x 20 256 x 20 x 20 2469888 (39) route - 512 x 20 x 20 2469888 (40) conv_silu 512 x 20 x 20 256 x 20 x 20 2601984 (41) conv_silu 256 x 20 x 20 256 x 20 x 20 2668544 (42) conv_silu 256 x 20 x 20 256 x 20 x 20 3259392 (43) shortcut_linear: 40 - 256 x 20 x 20 -
(44) route - 512 x 20 x 20 3259392 (45) conv_silu 512 x 20 x 20 512 x 20 x 20 3523584 (46) conv_silu 512 x 20 x 20 256 x 20 x 20 3655680 (47) maxpool 256 x 20 x 20 256 x 20 x 20 3655680 (48) maxpool 256 x 20 x 20 256 x 20 x 20 3655680 (49) maxpool 256 x 20 x 20 256 x 20 x 20 3655680 (50) route - 1024 x 20 x 20 3655680 (51) conv_silu 1024 x 20 x 20 512 x 20 x 20 4182016 (52) conv_silu 512 x 20 x 20 256 x 20 x 20 4314112 (53) upsample 256 x 20 x 20 256 x 40 x 40 -
(54) route - 512 x 40 x 40 4314112 (55) conv_silu 512 x 40 x 40 128 x 40 x 40 4380160 (56) route - 512 x 40 x 40 4380160 (57) conv_silu 512 x 40 x 40 128 x 40 x 40 4446208 (58) conv_silu 128 x 40 x 40 128 x 40 x 40 4463104 (59) conv_silu 128 x 40 x 40 128 x 40 x 40 4611072 (60) route - 256 x 40 x 40 4611072 (61) conv_silu 256 x 40 x 40 256 x 40 x 40 4677632 (62) conv_silu 256 x 40 x 40 128 x 40 x 40 4710912 (63) upsample 128 x 40 x 40 128 x 80 x 80 -
(64) route - 256 x 80 x 80 4710912 (65) conv_silu 256 x 80 x 80 64 x 80 x 80 4727552 (66) route - 256 x 80 x 80 4727552 (67) conv_silu 256 x 80 x 80 64 x 80 x 80 4744192 (68) conv_silu 64 x 80 x 80 64 x 80 x 80 4748544 (69) conv_silu 64 x 80 x 80 64 x 80 x 80 4785664 (70) route - 128 x 80 x 80 4785664 (71) conv_silu 128 x 80 x 80 128 x 80 x 80 4802560 (72) conv_silu 128 x 80 x 80 128 x 40 x 40 4950528 (73) route - 256 x 40 x 40 4950528 (74) conv_silu 256 x 40 x 40 128 x 40 x 40 4983808 (75) route - 256 x 40 x 40 4983808 (76) conv_silu 256 x 40 x 40 128 x 40 x 40 5017088 (77) conv_silu 128 x 40 x 40 128 x 40 x 40 5033984 (78) conv_silu 128 x 40 x 40 128 x 40 x 40 5181952 (79) route - 256 x 40 x 40 5181952 (80) conv_silu 256 x 40 x 40 256 x 40 x 40 5248512 (81) conv_silu 256 x 40 x 40 256 x 20 x 20 5839360 (82) route - 512 x 20 x 20 5839360 (83) conv_silu 512 x 20 x 20 256 x 20 x 20 5971456 (84) route - 512 x 20 x 20 5971456 (85) conv_silu 512 x 20 x 20 256 x 20 x 20 6103552 (86) conv_silu 256 x 20 x 20 256 x 20 x 20 6170112 (87) conv_silu 256 x 20 x 20 256 x 20 x 20 6760960 (88) route - 512 x 20 x 20 6760960 (89) conv_silu 512 x 20 x 20 512 x 20 x 20 7025152 (90) route - 128 x 80 x 80 7025152 (91) conv_logistic 128 x 80 x 80 255 x 80 x 80 7058047 (92) yolo 255 x 80 x 80 255 x 80 x 80 7058047 (93) route - 256 x 40 x 40 7058047 (94) conv_logistic 256 x 40 x 40 255 x 40 x 40 7123582 (95) yolo 255 x 40 x 40 255 x 40 x 40 7123582 (96) route - 512 x 20 x 20 7123582 (97) conv_logistic 512 x 20 x 20 255 x 20 x 20 7254397 (98) yolo 255 x 20 x 20 255 x 20 x 20 7254397 Output YOLO blob names: yolo_93 yolo_96 yolo_99 Total number of YOLO layers: 273 Building YOLO network complete Building the TensorRT Engine

WARNING: [TRT]: route_3: Concatenation on DLA requires at least two inputs. WARNING: [TRT]: Default DLA is enabled but layer route_3 is not supported on DLA, falling back to GPU. WARNING: [TRT]: route_12: Concatenation on DLA requires at least two inputs. WARNING: [TRT]: Default DLA is enabled but layer route_12 is not supported on DLA, falling back to GPU. WARNING: [TRT]: route_24: Concatenation on DLA requires at least two inputs. WARNING: [TRT]: Default DLA is enabled but layer route_24 is not supported on DLA, falling back to GPU. WARNING: [TRT]: route_39: Concatenation on DLA requires at least two inputs. WARNING: [TRT]: Default DLA is enabled but layer route_39 is not supported on DLA, falling back to GPU. WARNING: [TRT]: Default DLA is enabled but layer upsample_53 is not supported on DLA, falling back to GPU. WARNING: [TRT]: route_56: Concatenation on DLA requires at least two inputs. WARNING: [TRT]: Default DLA is enabled but layer route_56 is not supported on DLA, falling back to GPU. WARNING: [TRT]: Default DLA is enabled but layer upsample_63 is not supported on DLA, falling back to GPU. WARNING: [TRT]: route_66: Concatenation on DLA requires at least two inputs. WARNING: [TRT]: Default DLA is enabled but layer route_66 is not supported on DLA, falling back to GPU. WARNING: [TRT]: route_75: Concatenation on DLA requires at least two inputs. WARNING: [TRT]: Default DLA is enabled but layer route_75 is not supported on DLA, falling back to GPU. WARNING: [TRT]: route_84: Concatenation on DLA requires at least two inputs. WARNING: [TRT]: Default DLA is enabled but layer route_84 is not supported on DLA, falling back to GPU. WARNING: [TRT]: route_90: Concatenation on DLA requires at least two inputs. WARNING: [TRT]: Default DLA is enabled but layer route_90 is not supported on DLA, falling back to GPU. WARNING: [TRT]: Default DLA is enabled but layer yolo_93 is not supported on DLA, falling back to GPU. WARNING: [TRT]: route_93: Concatenation on DLA requires at least two inputs. WARNING: [TRT]: Default DLA is enabled but layer route_93 is not supported on DLA, falling back to GPU. WARNING: [TRT]: Default DLA is enabled but layer yolo_96 is not supported on DLA, falling back to GPU. WARNING: [TRT]: route_96: Concatenation on DLA requires at least two inputs. WARNING: [TRT]: Default DLA is enabled but layer route_96 is not supported on DLA, falling back to GPU. WARNING: [TRT]: Default DLA is enabled but layer yolo_99 is not supported on DLA, falling back to GPU. WARNING: [TRT]: DLA supports only 8 subgraphs per DLA core. Switching to GPU for layer conv_1 WARNING: [TRT]: DLA supports only 8 subgraphs per DLA core. Switching to GPU for layer route_54 WARNING: [TRT]: DLA supports only 8 subgraphs per DLA core. Switching to GPU for layer route_64 WARNING: [TRT]: DLA supports only 8 subgraphs per DLA core. Switching to GPU for layer conv_98 WARNING: [TRT]: Detected invalid timing cache, setup a local cache instead

ERROR: [TRT]: 2: [nvdlaUtils.cpp::getInputDesc::176] Error Code 2: Internal Error (Assertion idx < num failed.Index is out of range of valid number of input tensors.) Building engine failed

marcoslucianops commented 2 years ago

@willosonico, it's an issue in TensorRT: https://forums.developer.nvidia.com/t/error-while-building-engine-on-tensorrt8-0-2/202978/3.