Closed ChristinaHsu0115 closed 7 months ago
Try downgrading transformers
to 4.36.2
@ChristinaHsu0115 Please consider renaming the issue. AMD did not submit to v3.1. You are using NVIDIA's code.
/cc @nv-ananjappa @mrmhodak
@lapp0 Thanks for help. I dont know how to update transformers to 4.36.2 exactly. It had lots of dependiency with fsspec, tdqm, huggingface.... so i change two step as below:
(mlperf) jay@mlperf-inference-jay-x86-64-19218:/work$ make run RUN_ARGS="--benchmarks=gptj --scenarios=offline"
make[1]: Entering directory '/work'
[2024-01-24 12:17:41,391 main.py:230 INFO] Detected system ID: KnownSystem.k905_h100_x2
[2024-01-24 12:17:43,151 generate_engines.py:172 INFO] Building engines for gptj benchmark in Offline scenario...
[01/24/2024-12:17:43] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 44, GPU 942 (MiB)
[01/24/2024-12:17:50] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +4333, GPU +1150, now: CPU 4482, GPU 2094 (MiB)
[2024-01-24 12:17:51,765 gptj6b.py:103 INFO] Building GPTJ engine in ./build/engines/k905_h100_x2/gptj/Offline, use_fp8: False command: python build/TRTLLM/examples/gptj/build.py --dtype=float16 --use_gpt_attention_plugin=float16 --use_gemm_plugin=float16 --max_batch_size=32 --max_input_len=1919 --max_output_len=128 --vocab_size=50401 --max_beam_width=4 --output_dir=./build/engines/k905_h100_x2/gptj/Offline --model_dir=build/models/GPTJ-6B/checkpoint-final --enable_context_fmha --enable_two_optimization_profiles
[2024-01-24 12:20:20,141 gptj6b.py:122 INFO] Engine built complete and took 148.37598872184753s. Stored at ./build/engines/k905_h100_x2/gptj/Offline/gptj-Offline-gpu-b32-fp16.custom_k_99_MaxP.plan
[2024-01-24 12:20:20,141 generate_engines.py:176 INFO] Finished building engines for gptj benchmark in Offline scenario.
Time taken to generate engines: 156.99001169204712 seconds
make[1]: Leaving directory '/work'
make[1]: Entering directory '/work'
[2024-01-24 12:20:25,648 main.py:230 INFO] Detected system ID: KnownSystem.k905_h100_x2
[2024-01-24 12:20:25,751 harness.py:236 INFO] The harness will load 1 plugins: ['build/plugins/../TRTLLM/cpp/build/tensorrt_llm/plugins/libnvinfer_plugin.so']
[2024-01-24 12:20:25,751 generate_conf_files.py:107 INFO] Generated measurements/ entries for k905_h100_x2_TRT/gptj-99/Offline
[2024-01-24 12:20:25,752 init.py:46 INFO] Running command: ./build/bin/harness_gpt --plugins="build/plugins/../TRTLLM/cpp/build/tensorrt_llm/plugins/libnvinfer_plugin.so" --logfile_outdir="/work/build/logs/2024.01.24-12.17.38/k905_h100_x2_TRT/gptj-99/Offline" --logfile_prefix="mlperflog" --performance_sample_count=13368 --gpu_batch_size=32 --tensor_path="build/preprocessed_data/cnn_dailymail_tokenized_gptj/input_ids_padded.npy,build/preprocessed_data/cnn_dailymail_tokenized_gptj/masked_tokens.npy,build/preprocessed_data/cnn_dailymail_tokenized_gptj/input_lengths.npy" --use_graphs=false --gpu_inference_streams=1 --gpu_copy_streams=1 --tensor_parallelism=1 --enable_sort=true --num_sort_segments=2 --gpu_engines="./build/engines/k905_h100_x2/gptj/Offline/gptj-Offline-gpu-b32-fp16.custom_k_99_MaxP.plan" --mlperf_conf_path="build/loadgen-configs/k905_h100_x2_TRT/gptj-99/Offline/mlperf.conf" --user_conf_path="build/loadgen-configs/k905_h100_x2_TRT/gptj-99/Offline/user.conf" --scenario Offline --model gptj
[2024-01-24 12:20:25,752 init.py:53 INFO] Overriding Environment
benchmark : Benchmark.GPTJ
buffer_manager_thread_count : 0
coalesced_tensor : True
data_dir : /home/jay/inference_results_v3.1/closed/NVIDIA/scratch//data
enable_sort : True
gpu_batch_size : 32
gpu_copy_streams : 1
gpu_inference_streams : 1
input_dtype : int32
input_format : linear
log_dir : /work/build/logs/2024.01.24-12.17.38
num_sort_segments : 2
offline_expected_qps : 76
precision : fp16
preprocessed_data_dir : /home/jay/inference_results_v3.1/closed/NVIDIA/scratch//preprocessed_data
scenario : Scenario.Offline
system : SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='AMD EPYC 9654 96-Core Processor', architecture=<CPUArchitecture.x86_64: AliasedName(name='x86_64', aliases=(), patterns=())>, core_count=96, threads_per_core=2): 2}), host_mem_conf=MemoryConfiguration(host_memory_capacity=Memory(quantity=1.5849335560000002, byte_suffix=<ByteSuffix.TB: (1000, 4)>, _num_bytes=1584933556000), comparison_tolerance=0.05), accelerator_conf=AcceleratorConfiguration(layout=defaultdict(<class 'int'>, {GPU(name='NVIDIA H100 PCIe', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=79.6474609375, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=85520809984), max_power_limit=350.0, pci_id='0x233110DE', compute_sm=90): 2})), numa_conf=NUMAConfiguration(numa_nodes={}, num_numa_nodes=2), system_id='k905_h100_x2')
tensor_parallelism : 1
tensor_path : build/preprocessed_data/cnn_dailymail_tokenized_gptj/input_ids_padded.npy,build/preprocessed_data/cnn_dailymail_tokenized_gptj/masked_tokens.npy,build/preprocessed_data/cnn_dailymail_tokenized_gptj/input_lengths.npy
use_graphs : False
system_id : k905_h100_x2
config_name : k905_h100_x2_gptj_Offline
workload_setting : WorkloadSetting(HarnessType.Custom, AccuracyTarget.k_99, PowerSetting.MaxP)
optimization_level : plugin-enabled
use_cpu : False
use_inferentia : False
num_profiles : 1
config_ver : custom_k_99_MaxP
accuracy_level : 99%
inference_server : custom
skip_file_checks : False
power_limit : None
cpu_freq : None
&&&& RUNNING GPT_HARNESS # ./build/bin/harness_gpt
[I] Loading plugin: build/plugins/../TRTLLM/cpp/build/tensorrt_llm/plugins/libnvinfer_plugin.so
I0124 12:20:26.327747 13788 main_gpt.cc:122] Found 2 GPUs
I0124 12:20:27.282594 13788 gpt_server.cc:215] Loading 1 engine(s)
I0124 12:20:27.282637 13788 gpt_server.cc:218] Engine Path: ./build/engines/k905_h100_x2/gptj/Offline/gptj-Offline-gpu-b32-fp16.custom_k_99_MaxP.plan
[I] [TRT] Loaded engine size: 11546 MiB
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +66, now: CPU 35086, GPU 12554 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +2, GPU +72, now: CPU 35088, GPU 12626 (MiB)
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +11541, now: CPU 0, GPU 11541 (MiB)
[I] [TRT] Loaded engine size: 11546 MiB
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +7, GPU +66, now: CPU 23982, GPU 12093 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +72, now: CPU 23983, GPU 12165 (MiB)
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +11541, now: CPU 0, GPU 23082 (MiB)
I0124 12:20:40.118860 13788 gpt_server.cc:290] Engines Deserialization Completed
I0124 12:20:40.366228 13788 gpt_core.cc:64] GPTCore 0: MPI Rank - 0 at Device Id - 0
I0124 12:20:40.366343 13788 gpt_core.cc:262] Engine - Vocab size: 50401 Padded vocab size: 50401 Beam width: 4
I0124 12:20:40.369578 13788 gpt_core.cc:90] Engine - Device Memory requirements: 6539709440
I0124 12:20:40.369586 13788 gpt_core.cc:99] Engine - Total Number of Optimization Profiles: 2
I0124 12:20:40.369588 13788 gpt_core.cc:100] Engine - Number of Optimization Profiles Per Core: 2
I0124 12:20:40.369591 13788 gpt_core.cc:101] Engine - Start Index of Optimization Profiles: 0
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +64, now: CPU 893, GPU 18868 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +64, now: CPU 893, GPU 18932 (MiB)
I0124 12:20:40.602331 13788 gpt_core.cc:115] Setting Opt.Prof. to 0
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 23082 (MiB)
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +64, now: CPU 930, GPU 19032 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +72, now: CPU 930, GPU 19104 (MiB)
I0124 12:20:40.817628 13788 gpt_core.cc:115] Setting Opt.Prof. to 1
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 23082 (MiB)
[I] [TRT] Switching optimization profile from: 0 to 1. Please ensure there are no enqueued operations pending in this context prior to switching profiles
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
[mlperf-inference-jay-x86-64-19218:13788] Process received signal
[mlperf-inference-jay-x86-64-19218:13788] Signal: Aborted (6)
[mlperf-inference-jay-x86-64-19218:13788] Signal code: (-6)
[mlperf-inference-jay-x86-64-19218:13788] [ 0] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f0c5c775420]
[mlperf-inference-jay-x86-64-19218:13788] [ 1] /usr/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f0c5c26400b]
[mlperf-inference-jay-x86-64-19218:13788] [ 2] /usr/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f0c5c243859]
[mlperf-inference-jay-x86-64-19218:13788] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x9e8d1)[0x7f0c5c61b8d1]
[mlperf-inference-jay-x86-64-19218:13788] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa37c)[0x7f0c5c62737c]
[mlperf-inference-jay-x86-64-19218:13788] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa3e7)[0x7f0c5c6273e7]
[mlperf-inference-jay-x86-64-19218:13788] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(cxa_rethrow+0x4d)[0x7f0c5c6276ed]
[mlperf-inference-jay-x86-64-19218:13788] [ 7] ./build/bin/harness_gpt(+0x715c1)[0x564f8dfb35c1]
[mlperf-inference-jay-x86-64-19218:13788] [ 8] ./build/bin/harness_gpt(+0x6b45b)[0x564f8dfad45b]
[mlperf-inference-jay-x86-64-19218:13788] [ 9] ./build/bin/harness_gpt(+0x5d0fe)[0x564f8df9f0fe]
[mlperf-inference-jay-x86-64-19218:13788] [10] ./build/bin/harness_gpt(+0x2fc84)[0x564f8df71c84]
[mlperf-inference-jay-x86-64-19218:13788] [11] /usr/lib/x86_64-linux-gnu/libc.so.6(libc_start_main+0xf3)[0x7f0c5c245083]
[mlperf-inference-jay-x86-64-19218:13788] [12] ./build/bin/harness_gpt(+0x3074e)[0x564f8df7274e]
[mlperf-inference-jay-x86-64-19218:13788] End of error message
Aborted (core dumped)
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/work/code/main.py", line 232, in
The issue had been solved when modifed the parameter gpubatch size parameter on custom.py. Its able to run gptj benchmark. Thanks to all.
I had experience when running inference 3.0 with 2 of A100 PCIE GPU card And the gptj model is new on inference 3.1. follow the below link : https://github.com/mlcommons/inference_results_v3.1/tree/main/closed/NVIDIA#readme
here are the procedure for your refernce 1: make prebuild to enter the container enviroment 2: make build
(mlperf) test@mlperf-inference-test-x86-64-7440:/work$ make run RUN_ARGS="--benchmarks=gptj --scenarios=offline" make[1]: Entering directory '/work' [2024-01-22 10:34:01,320 main.py:230 INFO] Detected system ID: KnownSystem.K905_A100X2 [2024-01-22 10:34:02,953 generate_engines.py:172 INFO] Building engines for gptj benchmark in Offline scenario... [01/22/2024-10:34:02] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 43, GPU 874 (MiB) [01/22/2024-10:34:08] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1957, GPU +346, now: CPU 2105, GPU 1220 (MiB) [2024-01-22 10:34:09,676 gptj6b.py:103 INFO] Building GPTJ engine in ./build/engines/K905_A100X2/gptj/Offline, use_fp8: False command: python build/TRTLLM/examples/gptj/build.py --dtype=float16 --use_gpt_attention_plugin=float16 --use_gemm_plugin=float16 --max_batch_size=32 --max_input_len=1919 --max_output_len=128 --vocab_size=50401 --max_beam_width=4 --output_dir=./build/engines/K905_A100X2/gptj/Offline --model_dir=build/models/GPTJ-6B/checkpoint-final --enable_context_fmha --enable_two_optimization_profiles Process Process-1: Traceback (most recent call last): File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/work/code/actionhandler/base.py", line 189, in subprocess_target return self.action_handler.handle() File "/work/code/actionhandler/generate_engines.py", line 175, in handle total_engine_build_time += self.build_engine(job) File "/work/code/actionhandler/generate_engines.py", line 166, in build_engine builder.build_engines() File "/work/code/gptj/tensorrt/gptj6b.py", line 115, in build_engines raise RuntimeError(f"Engine build fails! stderr: {ret.stderr}. See engine log: {stdout_fn} and {stderr_fn}") RuntimeError: Engine build fails! stderr: [01/22/2024-10:34:10] [TRT-LLM] [I] Loading HF GPTJ model from build/models/GPTJ-6B/checkpoint-final...
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s] Loading checkpoint shards: 33%|███▎ | 1/3 [00:02<00:05, 2.71s/it] Loading checkpoint shards: 33%|███▎ | 1/3 [00:02<00:05, 2.71s/it] Traceback (most recent call last): File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 460, in load_state_dict return torch.load(checkpoint_file, map_location="cpu") File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 868, in load with _open_zipfile_reader(opened_file) as opened_zipfile: File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 333, in init super().init(torch._C.PyTorchFileReader(name_or_buffer)) RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 464, in load_state_dict if f.read(7) == "version": File "/usr/lib/python3.8/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 128: invalid start byte
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "build/TRTLLM/examples/gptj/build.py", line 473, in
args = parse_arguments()
File "build/TRTLLM/examples/gptj/build.py", line 146, in parse_arguments
hf_gpt = AutoModelForCausalLM.from_pretrained(args.model_dir)
File "/home/test/.local/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 493, in from_pretrained
return model_class.from_pretrained(
File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2903, in from_pretrained
) = cls._load_pretrained_model(
File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 3246, in _load_pretrained_model
state_dict = load_state_dict(shard_file)
File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 476, in load_state_dict
raise OSError(
OSError: Unable to load weights from pytorch checkpoint file for 'build/models/GPTJ-6B/checkpoint-final/pytorch_model-00002-of-00003.bin' at 'build/models/GPTJ-6B/checkpoint-final/pytorch_model-00002-of-00003.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.
. See engine log: ./build/engines/K905_A100X2/gptj/Offline/gptj-Offline-gpu-b32-fp16.custom_k_99_MaxP.stdout and ./build/engines/K905_A100X2/gptj/Offline/gptj-Offline-gpu-b32-fp16.custom_k_99_MaxP.stderr
[2024-01-22 10:34:40,406 generate_engines.py:172 INFO] Building engines for gptj benchmark in Offline scenario...
[01/22/2024-10:34:40] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 43, GPU 874 (MiB)
[01/22/2024-10:34:46] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1957, GPU +346, now: CPU 2105, GPU 1220 (MiB)
[2024-01-22 10:34:47,175 gptj6b.py:103 INFO] Building GPTJ engine in ./build/engines/K905_A100X2/gptj/Offline, use_fp8: False command: python build/TRTLLM/examples/gptj/build.py --dtype=float16 --use_gpt_attention_plugin=float16 --use_gemm_plugin=float16 --max_batch_size=32 --max_input_len=1919 --max_output_len=128 --vocab_size=50401 --max_beam_width=4 --output_dir=./build/engines/K905_A100X2/gptj/Offline --model_dir=build/models/GPTJ-6B/checkpoint-final --enable_context_fmha --enable_two_optimization_profiles
Process Process-2:
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/work/code/actionhandler/base.py", line 189, in subprocess_target
return self.action_handler.handle()
File "/work/code/actionhandler/generate_engines.py", line 175, in handle
total_engine_build_time += self.build_engine(job)
File "/work/code/actionhandler/generate_engines.py", line 166, in build_engine
builder.build_engines()
File "/work/code/gptj/tensorrt/gptj6b.py", line 115, in build_engines
raise RuntimeError(f"Engine build fails! stderr: {ret.stderr}. See engine log: {stdout_fn} and {stderr_fn}")
RuntimeError: Engine build fails! stderr: [01/22/2024-10:34:48] [TRT-LLM] [I] Loading HF GPTJ model from build/models/GPTJ-6B/checkpoint-final...
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s] Loading checkpoint shards: 33%|███▎ | 1/3 [00:02<00:05, 2.90s/it] Loading checkpoint shards: 33%|███▎ | 1/3 [00:02<00:05, 2.90s/it] Traceback (most recent call last): File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 460, in load_state_dict return torch.load(checkpoint_file, map_location="cpu") File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 868, in load with _open_zipfile_reader(opened_file) as opened_zipfile: File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 333, in init super().init(torch._C.PyTorchFileReader(name_or_buffer)) RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 464, in load_state_dict if f.read(7) == "version": File "/usr/lib/python3.8/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 128: invalid start byte
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "build/TRTLLM/examples/gptj/build.py", line 473, in
args = parse_arguments()
File "build/TRTLLM/examples/gptj/build.py", line 146, in parse_arguments
hf_gpt = AutoModelForCausalLM.from_pretrained(args.model_dir)
File "/home/test/.local/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 493, in from_pretrained
return model_class.from_pretrained(
File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2903, in from_pretrained
) = cls._load_pretrained_model(
File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 3246, in _load_pretrained_model
state_dict = load_state_dict(shard_file)
File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 476, in load_state_dict
raise OSError(
OSError: Unable to load weights from pytorch checkpoint file for 'build/models/GPTJ-6B/checkpoint-final/pytorch_model-00002-of-00003.bin' at 'build/models/GPTJ-6B/checkpoint-final/pytorch_model-00002-of-00003.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.
. See engine log: ./build/engines/K905_A100X2/gptj/Offline/gptj-Offline-gpu-b32-fp16.custom_k_99_MaxP.stdout and ./build/engines/K905_A100X2/gptj/Offline/gptj-Offline-gpu-b32-fp16.custom_k_99_MaxP.stderr
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/work/code/main.py", line 232, in
main(main_args, DETECTED_SYSTEM)
File "/work/code/main.py", line 145, in main
dispatch_action(main_args, config_dict, workload_setting)
File "/work/code/main.py", line 203, in dispatch_action
handler.run()
File "/work/code/actionhandler/base.py", line 82, in run
self.handle_failure()
File "/work/code/actionhandler/base.py", line 186, in handle_failure
self.action_handler.handle_failure()
File "/work/code/actionhandler/generate_engines.py", line 183, in handle_failure
raise RuntimeError("Building engines failed!")
RuntimeError: Building engines failed!
make[1]: [Makefile:37: generate_engines] Error 1
make[1]: Leaving directory '/work'
make: [Makefile:31: run] Error 2
(mlperf) test@mlperf-inference-test-x86-64-7440:/work$