mlcommons / inference_results_v2.0

This repository contains the results and code for the MLPerf™ Inference v2.0 benchmark.
https://mlcommons.org/en/inference-datacenter-20/
Apache License 2.0
9 stars 12 forks source link

Build Engine failed #20

Open khushbuKinara opened 1 year ago

khushbuKinara commented 1 year ago

Hi,

I am trying to run the MLPerf benchmark for resnet50 on NVIDIA Xavier NX with CUDA 10.2 Getting below error while building engine,

jetson@yahboom:~/projects/inference_results_v2.0/closed/NVIDIA$ make run RUN_ARGS="--benchmarks=resnet50 --scenarios=offline" make[1]: Entering directory '/home/jetson/projects/inference_results_v2.0/closed/NVIDIA' [2023-06-13 01:59:19,037 main.py:770 INFO] Detected System ID: KnownSystem.Xavier_NX [2023-06-13 01:59:22,040 main.py:108 INFO] Building engines for resnet50 benchmark in Offline scenario... [2023-06-13 01:59:22,048 main.py:117 INFO] Building DLA engine for Xavier_NX_resnet50_Offline [2023-06-13 01:59:22,140 ResNet50.py:39 INFO] Using workspace size: 1073741824 [TensorRT] INFO: [MemUsageChange] Init CUDA: CPU +346, GPU +0, now: CPU 385, GPU 6745 (MiB) [2023-06-13 01:59:24,017 builder.py:106 INFO] Using DLA: Core 0 [2023-06-13 01:59:27,339 rn50_graphsurgeon.py:448 INFO] Renaming layers [2023-06-13 01:59:27,340 rn50_graphsurgeon.py:459 INFO] Renaming tensors [2023-06-13 01:59:27,341 rn50_graphsurgeon.py:728 INFO] Adding Squeeze [2023-06-13 01:59:27,342 rn50_graphsurgeon.py:763 INFO] Adding Conv layer, instead of FC [2023-06-13 01:59:27,345 rn50_graphsurgeon.py:784 INFO] Adding TopK layer [2023-06-13 01:59:27,345 rn50_graphsurgeon.py:801 INFO] Removing obsolete layers [2023-06-13 01:59:28,972 ResNet50.py:96 INFO] Unmarking output: topk_layer_output_value [TensorRT] WARNING: DynamicRange(min: -128, max: 127). Dynamic range should be symmetric for better accuracy. [2023-06-13 01:59:28,973 builder.py:176 INFO] Building ./build/engines/Xavier_NX/resnet50/Offline/resnet50-Offline-dla-b32-int8.lwis_k_99_MaxP.plan [TensorRT] WARNING: Default DLA is enabled but layer topk_layer is not supported on DLA, falling back to GPU. [TensorRT] INFO: [MemUsageSnapshot] Builder begin: CPU 868 MiB, GPU 7107 MiB [TensorRT] INFO: Reading Calibration Cache for calibrator: EntropyCalibration2 [TensorRT] INFO: Generated calibration scales using calibration cache. Make sure that calibration cache has latest scales. [TensorRT] INFO: To regenerate calibration cache, please delete the existing one. TensorRT will generate a new calibration cache. [TensorRT] INFO: ---------- Layers Running on DLA ---------- [TensorRT] INFO: [DlaLayer] {ForeignNode[conv1...fc_replaced]} [TensorRT] INFO: ---------- Layers Running on GPU ---------- [TensorRT] INFO: [GpuLayer] topk_layer [TensorRT] ERROR: 2: [eglUtils.cpp::operator()::99] Error Code 2: Internal Error (Assertion (eglCreateStreamKHR) != nullptr failed.) Process Process-1: Traceback (most recent call last): File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/home/jetson/projects/inference_results_v2.0/closed/NVIDIA/code/main.py", line 123, in handle_generate_engine b.build_engines() File "/home/jetson/projects/inference_results_v2.0/closed/NVIDIA/code/common/builder.py", line 207, in build_engines buf = engine.serialize() AttributeError: 'NoneType' object has no attribute 'serialize' Traceback (most recent call last): File "code/main.py", line 772, in <module> main(main_args, DETECTED_SYSTEM) File "code/main.py", line 744, in main dispatch_action(main_args, config_dict, workload_id, equiv_engine_setting=equiv_engine_setting) File "code/main.py", line 553, in dispatch_action launch_handle_generate_engine(*_gen_args, **_gen_kwargs) File "code/main.py", line 92, in launch_handle_generate_engine raise RuntimeError("Building engines failed!") RuntimeError: Building engines failed! Makefile:692: recipe for target 'generate_engines' failed make[1]: *** [generate_engines] Error 1 make[1]: Leaving directory '/home/jetson/projects/inference_results_v2.0/closed/NVIDIA' Makefile:686: recipe for target 'run' failed make: *** [run] Error 2

When I try to build it with

make run RUN_ARGS="--benchmarks=resnet50 --scenarios=offline --gpu_only"

It working fine, But as I am running on gpu only performance are not matching.

Can somebody please help me resolve the above error. Thanks in advance.

nvyihengz commented 1 year ago

Based on the logs and the behavior the error came from the DLA. There are some similar questions asked in the Jetson forum