Build for gptj docker fails

I had experience when running inference 3.0 with 2 of A100 PCIE GPU card And the gptj model is new on inference 3.1. follow the below link : https://github.com/mlcommons/inference_results_v3.1/tree/main/closed/NVIDIA#readme

here are the procedure for your refernce 1: make prebuild to enter the container enviroment 2: make build

download gptj dataset
download gptj model
preprocessed gptj data.
create custom config file and modify with correct parameter
run gptj benchmark with offline scenarios. I got error message as below: Does anyone how to fix the problem?

(mlperf) test@mlperf-inference-test-x86-64-7440:/work$ make run RUN_ARGS="--benchmarks=gptj --scenarios=offline" make[1]: Entering directory '/work' [2024-01-22 10:34:01,320 main.py:230 INFO] Detected system ID: KnownSystem.K905_A100X2 [2024-01-22 10:34:02,953 generate_engines.py:172 INFO] Building engines for gptj benchmark in Offline scenario... [01/22/2024-10:34:02] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 43, GPU 874 (MiB) [01/22/2024-10:34:08] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1957, GPU +346, now: CPU 2105, GPU 1220 (MiB) [2024-01-22 10:34:09,676 gptj6b.py:103 INFO] Building GPTJ engine in ./build/engines/K905_A100X2/gptj/Offline, use_fp8: False command: python build/TRTLLM/examples/gptj/build.py --dtype=float16 --use_gpt_attention_plugin=float16 --use_gemm_plugin=float16 --max_batch_size=32 --max_input_len=1919 --max_output_len=128 --vocab_size=50401 --max_beam_width=4 --output_dir=./build/engines/K905_A100X2/gptj/Offline --model_dir=build/models/GPTJ-6B/checkpoint-final --enable_context_fmha --enable_two_optimization_profiles Process Process-1: Traceback (most recent call last): File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/work/code/actionhandler/base.py", line 189, in subprocess_target return self.action_handler.handle() File "/work/code/actionhandler/generate_engines.py", line 175, in handle total_engine_build_time += self.build_engine(job) File "/work/code/actionhandler/generate_engines.py", line 166, in build_engine builder.build_engines() File "/work/code/gptj/tensorrt/gptj6b.py", line 115, in build_engines raise RuntimeError(f"Engine build fails! stderr: {ret.stderr}. See engine log: {stdout_fn} and {stderr_fn}") RuntimeError: Engine build fails! stderr: [01/22/2024-10:34:10] [TRT-LLM] [I] Loading HF GPTJ model from build/models/GPTJ-6B/checkpoint-final...

Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s] Loading checkpoint shards: 33%|███▎ | 1/3 [00:02<00:05, 2.71s/it] Loading checkpoint shards: 33%|███▎ | 1/3 [00:02<00:05, 2.71s/it] Traceback (most recent call last): File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 460, in load_state_dict return torch.load(checkpoint_file, map_location="cpu") File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 868, in load with _open_zipfile_reader(opened_file) as opened_zipfile: File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 333, in init super().init(torch._C.PyTorchFileReader(name_or_buffer)) RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 464, in load_state_dict if f.read(7) == "version": File "/usr/lib/python3.8/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 128: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "build/TRTLLM/examples/gptj/build.py", line 473, in args = parse_arguments() File "build/TRTLLM/examples/gptj/build.py", line 146, in parse_arguments hf_gpt = AutoModelForCausalLM.from_pretrained(args.model_dir) File "/home/test/.local/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 493, in from_pretrained return model_class.from_pretrained( File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2903, in from_pretrained ) = cls._load_pretrained_model( File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 3246, in _load_pretrained_model state_dict = load_state_dict(shard_file) File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 476, in load_state_dict raise OSError( OSError: Unable to load weights from pytorch checkpoint file for 'build/models/GPTJ-6B/checkpoint-final/pytorch_model-00002-of-00003.bin' at 'build/models/GPTJ-6B/checkpoint-final/pytorch_model-00002-of-00003.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True. . See engine log: ./build/engines/K905_A100X2/gptj/Offline/gptj-Offline-gpu-b32-fp16.custom_k_99_MaxP.stdout and ./build/engines/K905_A100X2/gptj/Offline/gptj-Offline-gpu-b32-fp16.custom_k_99_MaxP.stderr [2024-01-22 10:34:40,406 generate_engines.py:172 INFO] Building engines for gptj benchmark in Offline scenario... [01/22/2024-10:34:40] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 43, GPU 874 (MiB) [01/22/2024-10:34:46] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1957, GPU +346, now: CPU 2105, GPU 1220 (MiB) [2024-01-22 10:34:47,175 gptj6b.py:103 INFO] Building GPTJ engine in ./build/engines/K905_A100X2/gptj/Offline, use_fp8: False command: python build/TRTLLM/examples/gptj/build.py --dtype=float16 --use_gpt_attention_plugin=float16 --use_gemm_plugin=float16 --max_batch_size=32 --max_input_len=1919 --max_output_len=128 --vocab_size=50401 --max_beam_width=4 --output_dir=./build/engines/K905_A100X2/gptj/Offline --model_dir=build/models/GPTJ-6B/checkpoint-final --enable_context_fmha --enable_two_optimization_profiles Process Process-2: Traceback (most recent call last): File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/work/code/actionhandler/base.py", line 189, in subprocess_target return self.action_handler.handle() File "/work/code/actionhandler/generate_engines.py", line 175, in handle total_engine_build_time += self.build_engine(job) File "/work/code/actionhandler/generate_engines.py", line 166, in build_engine builder.build_engines() File "/work/code/gptj/tensorrt/gptj6b.py", line 115, in build_engines raise RuntimeError(f"Engine build fails! stderr: {ret.stderr}. See engine log: {stdout_fn} and {stderr_fn}") RuntimeError: Engine build fails! stderr: [01/22/2024-10:34:48] [TRT-LLM] [I] Loading HF GPTJ model from build/models/GPTJ-6B/checkpoint-final...

Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s] Loading checkpoint shards: 33%|███▎ | 1/3 [00:02<00:05, 2.90s/it] Loading checkpoint shards: 33%|███▎ | 1/3 [00:02<00:05, 2.90s/it] Traceback (most recent call last): File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 460, in load_state_dict return torch.load(checkpoint_file, map_location="cpu") File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 868, in load with _open_zipfile_reader(opened_file) as opened_zipfile: File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 333, in init super().init(torch._C.PyTorchFileReader(name_or_buffer)) RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "build/TRTLLM/examples/gptj/build.py", line 473, in args = parse_arguments() File "build/TRTLLM/examples/gptj/build.py", line 146, in parse_arguments hf_gpt = AutoModelForCausalLM.from_pretrained(args.model_dir) File "/home/test/.local/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 493, in from_pretrained return model_class.from_pretrained( File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2903, in from_pretrained ) = cls._load_pretrained_model( File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 3246, in _load_pretrained_model state_dict = load_state_dict(shard_file) File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 476, in load_state_dict raise OSError( OSError: Unable to load weights from pytorch checkpoint file for 'build/models/GPTJ-6B/checkpoint-final/pytorch_model-00002-of-00003.bin' at 'build/models/GPTJ-6B/checkpoint-final/pytorch_model-00002-of-00003.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True. . See engine log: ./build/engines/K905_A100X2/gptj/Offline/gptj-Offline-gpu-b32-fp16.custom_k_99_MaxP.stdout and ./build/engines/K905_A100X2/gptj/Offline/gptj-Offline-gpu-b32-fp16.custom_k_99_MaxP.stderr Traceback (most recent call last): File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/work/code/main.py", line 232, in main(main_args, DETECTED_SYSTEM) File "/work/code/main.py", line 145, in main dispatch_action(main_args, config_dict, workload_setting) File "/work/code/main.py", line 203, in dispatch_action handler.run() File "/work/code/actionhandler/base.py", line 82, in run self.handle_failure() File "/work/code/actionhandler/base.py", line 186, in handle_failure self.action_handler.handle_failure() File "/work/code/actionhandler/generate_engines.py", line 183, in handle_failure raise RuntimeError("Building engines failed!") RuntimeError: Building engines failed! make[1]: [Makefile:37: generate_engines] Error 1 make[1]: Leaving directory '/work' make: [Makefile:31: run] Error 2 (mlperf) test@mlperf-inference-test-x86-64-7440:/work$

@lapp0 Thanks for help. I dont know how to update transformers to 4.36.2 exactly. It had lots of dependiency with fsspec, tdqm, huggingface.... so i change two step as below:

I download pytorch_model bin file from another site (Note. make download_model BENCHMARKS="gpt" ->The file of pytorch had split 3 bin. And 2nd of bin file is broken.)
modify the "ignore_mismatched_sizes = True" on pretrained. So its able to run gptj benchmark. But I got another problem as below: Does anyone know how to fix the problem?

(mlperf) jay@mlperf-inference-jay-x86-64-19218:/work$ make run RUN_ARGS="--benchmarks=gptj --scenarios=offline" make[1]: Entering directory '/work' [2024-01-24 12:17:41,391 main.py:230 INFO] Detected system ID: KnownSystem.k905_h100_x2 [2024-01-24 12:17:43,151 generate_engines.py:172 INFO] Building engines for gptj benchmark in Offline scenario... [01/24/2024-12:17:43] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 44, GPU 942 (MiB) [01/24/2024-12:17:50] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +4333, GPU +1150, now: CPU 4482, GPU 2094 (MiB) [2024-01-24 12:17:51,765 gptj6b.py:103 INFO] Building GPTJ engine in ./build/engines/k905_h100_x2/gptj/Offline, use_fp8: False command: python build/TRTLLM/examples/gptj/build.py --dtype=float16 --use_gpt_attention_plugin=float16 --use_gemm_plugin=float16 --max_batch_size=32 --max_input_len=1919 --max_output_len=128 --vocab_size=50401 --max_beam_width=4 --output_dir=./build/engines/k905_h100_x2/gptj/Offline --model_dir=build/models/GPTJ-6B/checkpoint-final --enable_context_fmha --enable_two_optimization_profiles [2024-01-24 12:20:20,141 gptj6b.py:122 INFO] Engine built complete and took 148.37598872184753s. Stored at ./build/engines/k905_h100_x2/gptj/Offline/gptj-Offline-gpu-b32-fp16.custom_k_99_MaxP.plan [2024-01-24 12:20:20,141 generate_engines.py:176 INFO] Finished building engines for gptj benchmark in Offline scenario. Time taken to generate engines: 156.99001169204712 seconds make[1]: Leaving directory '/work' make[1]: Entering directory '/work' [2024-01-24 12:20:25,648 main.py:230 INFO] Detected system ID: KnownSystem.k905_h100_x2 [2024-01-24 12:20:25,751 harness.py:236 INFO] The harness will load 1 plugins: ['build/plugins/../TRTLLM/cpp/build/tensorrt_llm/plugins/libnvinfer_plugin.so'] [2024-01-24 12:20:25,751 generate_conf_files.py:107 INFO] Generated measurements/ entries for k905_h100_x2_TRT/gptj-99/Offline [2024-01-24 12:20:25,752 init.py:46 INFO] Running command: ./build/bin/harness_gpt --plugins="build/plugins/../TRTLLM/cpp/build/tensorrt_llm/plugins/libnvinfer_plugin.so" --logfile_outdir="/work/build/logs/2024.01.24-12.17.38/k905_h100_x2_TRT/gptj-99/Offline" --logfile_prefix="mlperflog" --performance_sample_count=13368 --gpu_batch_size=32 --tensor_path="build/preprocessed_data/cnn_dailymail_tokenized_gptj/input_ids_padded.npy,build/preprocessed_data/cnn_dailymail_tokenized_gptj/masked_tokens.npy,build/preprocessed_data/cnn_dailymail_tokenized_gptj/input_lengths.npy" --use_graphs=false --gpu_inference_streams=1 --gpu_copy_streams=1 --tensor_parallelism=1 --enable_sort=true --num_sort_segments=2 --gpu_engines="./build/engines/k905_h100_x2/gptj/Offline/gptj-Offline-gpu-b32-fp16.custom_k_99_MaxP.plan" --mlperf_conf_path="build/loadgen-configs/k905_h100_x2_TRT/gptj-99/Offline/mlperf.conf" --user_conf_path="build/loadgen-configs/k905_h100_x2_TRT/gptj-99/Offline/user.conf" --scenario Offline --model gptj [2024-01-24 12:20:25,752 init.py:53 INFO] Overriding Environment benchmark : Benchmark.GPTJ buffer_manager_thread_count : 0 coalesced_tensor : True data_dir : /home/jay/inference_results_v3.1/closed/NVIDIA/scratch//data enable_sort : True gpu_batch_size : 32 gpu_copy_streams : 1 gpu_inference_streams : 1 input_dtype : int32 input_format : linear log_dir : /work/build/logs/2024.01.24-12.17.38 num_sort_segments : 2 offline_expected_qps : 76 precision : fp16 preprocessed_data_dir : /home/jay/inference_results_v3.1/closed/NVIDIA/scratch//preprocessed_data scenario : Scenario.Offline system : SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='AMD EPYC 9654 96-Core Processor', architecture=<CPUArchitecture.x86_64: AliasedName(name='x86_64', aliases=(), patterns=())>, core_count=96, threads_per_core=2): 2}), host_mem_conf=MemoryConfiguration(host_memory_capacity=Memory(quantity=1.5849335560000002, byte_suffix=<ByteSuffix.TB: (1000, 4)>, _num_bytes=1584933556000), comparison_tolerance=0.05), accelerator_conf=AcceleratorConfiguration(layout=defaultdict(<class 'int'>, {GPU(name='NVIDIA H100 PCIe', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=79.6474609375, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=85520809984), max_power_limit=350.0, pci_id='0x233110DE', compute_sm=90): 2})), numa_conf=NUMAConfiguration(numa_nodes={}, num_numa_nodes=2), system_id='k905_h100_x2') tensor_parallelism : 1 tensor_path : build/preprocessed_data/cnn_dailymail_tokenized_gptj/input_ids_padded.npy,build/preprocessed_data/cnn_dailymail_tokenized_gptj/masked_tokens.npy,build/preprocessed_data/cnn_dailymail_tokenized_gptj/input_lengths.npy use_graphs : False system_id : k905_h100_x2 config_name : k905_h100_x2_gptj_Offline workload_setting : WorkloadSetting(HarnessType.Custom, AccuracyTarget.k_99, PowerSetting.MaxP) optimization_level : plugin-enabled use_cpu : False use_inferentia : False num_profiles : 1 config_ver : custom_k_99_MaxP accuracy_level : 99% inference_server : custom skip_file_checks : False power_limit : None cpu_freq : None &&&& RUNNING GPT_HARNESS # ./build/bin/harness_gpt [I] Loading plugin: build/plugins/../TRTLLM/cpp/build/tensorrt_llm/plugins/libnvinfer_plugin.so I0124 12:20:26.327747 13788 main_gpt.cc:122] Found 2 GPUs I0124 12:20:27.282594 13788 gpt_server.cc:215] Loading 1 engine(s) I0124 12:20:27.282637 13788 gpt_server.cc:218] Engine Path: ./build/engines/k905_h100_x2/gptj/Offline/gptj-Offline-gpu-b32-fp16.custom_k_99_MaxP.plan [I] [TRT] Loaded engine size: 11546 MiB [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +66, now: CPU 35086, GPU 12554 (MiB) [I] [TRT] [MemUsageChange] Init cuDNN: CPU +2, GPU +72, now: CPU 35088, GPU 12626 (MiB) [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +11541, now: CPU 0, GPU 11541 (MiB) [I] [TRT] Loaded engine size: 11546 MiB [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +7, GPU +66, now: CPU 23982, GPU 12093 (MiB) [I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +72, now: CPU 23983, GPU 12165 (MiB) [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +11541, now: CPU 0, GPU 23082 (MiB) I0124 12:20:40.118860 13788 gpt_server.cc:290] Engines Deserialization Completed I0124 12:20:40.366228 13788 gpt_core.cc:64] GPTCore 0: MPI Rank - 0 at Device Id - 0 I0124 12:20:40.366343 13788 gpt_core.cc:262] Engine - Vocab size: 50401 Padded vocab size: 50401 Beam width: 4 I0124 12:20:40.369578 13788 gpt_core.cc:90] Engine - Device Memory requirements: 6539709440 I0124 12:20:40.369586 13788 gpt_core.cc:99] Engine - Total Number of Optimization Profiles: 2 I0124 12:20:40.369588 13788 gpt_core.cc:100] Engine - Number of Optimization Profiles Per Core: 2 I0124 12:20:40.369591 13788 gpt_core.cc:101] Engine - Start Index of Optimization Profiles: 0 [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +64, now: CPU 893, GPU 18868 (MiB) [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +64, now: CPU 893, GPU 18932 (MiB) I0124 12:20:40.602331 13788 gpt_core.cc:115] Setting Opt.Prof. to 0 [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 23082 (MiB) [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +64, now: CPU 930, GPU 19032 (MiB) [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +72, now: CPU 930, GPU 19104 (MiB) I0124 12:20:40.817628 13788 gpt_core.cc:115] Setting Opt.Prof. to 1 [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 23082 (MiB) [I] [TRT] Switching optimization profile from: 0 to 1. Please ensure there are no enqueued operations pending in this context prior to switching profiles terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc [mlperf-inference-jay-x86-64-19218:13788] Process received signal [mlperf-inference-jay-x86-64-19218:13788] Signal: Aborted (6) [mlperf-inference-jay-x86-64-19218:13788] Signal code: (-6) [mlperf-inference-jay-x86-64-19218:13788] [ 0] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f0c5c775420] [mlperf-inference-jay-x86-64-19218:13788] [ 1] /usr/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f0c5c26400b] [mlperf-inference-jay-x86-64-19218:13788] [ 2] /usr/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f0c5c243859] [mlperf-inference-jay-x86-64-19218:13788] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x9e8d1)[0x7f0c5c61b8d1] [mlperf-inference-jay-x86-64-19218:13788] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa37c)[0x7f0c5c62737c] [mlperf-inference-jay-x86-64-19218:13788] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa3e7)[0x7f0c5c6273e7] [mlperf-inference-jay-x86-64-19218:13788] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(cxa_rethrow+0x4d)[0x7f0c5c6276ed] [mlperf-inference-jay-x86-64-19218:13788] [ 7] ./build/bin/harness_gpt(+0x715c1)[0x564f8dfb35c1] [mlperf-inference-jay-x86-64-19218:13788] [ 8] ./build/bin/harness_gpt(+0x6b45b)[0x564f8dfad45b] [mlperf-inference-jay-x86-64-19218:13788] [ 9] ./build/bin/harness_gpt(+0x5d0fe)[0x564f8df9f0fe] [mlperf-inference-jay-x86-64-19218:13788] [10] ./build/bin/harness_gpt(+0x2fc84)[0x564f8df71c84] [mlperf-inference-jay-x86-64-19218:13788] [11] /usr/lib/x86_64-linux-gnu/libc.so.6(libc_start_main+0xf3)[0x7f0c5c245083] [mlperf-inference-jay-x86-64-19218:13788] [12] ./build/bin/harness_gpt(+0x3074e)[0x564f8df7274e] [mlperf-inference-jay-x86-64-19218:13788] End of error message Aborted (core dumped) Traceback (most recent call last): File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/work/code/main.py", line 232, in main(main_args, DETECTED_SYSTEM) File "/work/code/main.py", line 145, in main dispatch_action(main_args, config_dict, workload_setting) File "/work/code/main.py", line 203, in dispatch_action handler.run() File "/work/code/actionhandler/base.py", line 82, in run self.handle_failure() File "/work/code/actionhandler/run_harness.py", line 193, in handle_failure raise RuntimeError("Run harness failed!") RuntimeError: Run harness failed! Traceback (most recent call last): File "/work/code/actionhandler/run_harness.py", line 162, in handle result_data = self.harness.run_harness(flag_dict=self.harness_flag_dict, skip_generate_measurements=True) File "/work/code/common/harness.py", line 339, in run_harness output = run_command(self._construct_terminal_command(argstr), get_output=True, custom_env=self.env_vars) File "/work/code/common/init.py", line 67, in run_command raise subprocess.CalledProcessError(ret, cmd) subprocess.CalledProcessError: Command './build/bin/harness_gpt --plugins="build/plugins/../TRTLLM/cpp/build/tensorrt_llm/plugins/libnvinfer_plugin.so" --logfile_outdir="/work/build/logs/2024.01.24-12.17.38/k905_h100_x2_TRT/gptj-99/Offline" --logfile_prefix="mlperflog" --performance_sample_count=13368 --gpu_batch_size=32 --tensor_path="build/preprocessed_data/cnn_dailymail_tokenized_gptj/input_ids_padded.npy,build/preprocessed_data/cnn_dailymail_tokenized_gptj/masked_tokens.npy,build/preprocessed_data/cnn_dailymail_tokenized_gptj/input_lengths.npy" --use_graphs=false --gpu_inference_streams=1 --gpu_copy_streams=1 --tensor_parallelism=1 --enable_sort=true --num_sort_segments=2 --gpu_engines="./build/engines/k905_h100_x2/gptj/Offline/gptj-Offline-gpu-b32-fp16.custom_k_99_MaxP.plan" --mlperf_conf_path="build/loadgen-configs/k905_h100_x2_TRT/gptj-99/Offline/mlperf.conf" --user_conf_path="build/loadgen-configs/k905_h100_x2_TRT/gptj-99/Offline/user.conf" --scenario Offline --model gptj' returned non-zero exit status 134. make[1]: [Makefile:45: run_harness] Error 1 make[1]: Leaving directory '/work' make: [Makefile:32: run] Error 2 (mlperf) jay@mlperf-inference-jay-x86-64-19218:/work$

mlcommons / inference_results_v3.1

Build for gptj docker fails #14