mlcommons / inference_results_v3.1

This repository contains the results and code for the MLPerf™ Inference v3.1 benchmark.
https://mlcommons.org/benchmarks/inference-datacenter/
Apache License 2.0
11 stars 13 forks source link

Build for gptj docker fails #14

Closed ChristinaHsu0115 closed 7 months ago

ChristinaHsu0115 commented 7 months ago

I had experience when running inference 3.0 with 2 of A100 PCIE GPU card And the gptj model is new on inference 3.1. follow the below link : https://github.com/mlcommons/inference_results_v3.1/tree/main/closed/NVIDIA#readme

here are the procedure for your refernce 1: make prebuild to enter the container enviroment 2: make build

  1. download gptj dataset
  2. download gptj model
  3. preprocessed gptj data.
  4. create custom config file and modify with correct parameter
  5. run gptj benchmark with offline scenarios. I got error message as below: Does anyone how to fix the problem?

(mlperf) test@mlperf-inference-test-x86-64-7440:/work$ make run RUN_ARGS="--benchmarks=gptj --scenarios=offline" make[1]: Entering directory '/work' [2024-01-22 10:34:01,320 main.py:230 INFO] Detected system ID: KnownSystem.K905_A100X2 [2024-01-22 10:34:02,953 generate_engines.py:172 INFO] Building engines for gptj benchmark in Offline scenario... [01/22/2024-10:34:02] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 43, GPU 874 (MiB) [01/22/2024-10:34:08] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1957, GPU +346, now: CPU 2105, GPU 1220 (MiB) [2024-01-22 10:34:09,676 gptj6b.py:103 INFO] Building GPTJ engine in ./build/engines/K905_A100X2/gptj/Offline, use_fp8: False command: python build/TRTLLM/examples/gptj/build.py --dtype=float16 --use_gpt_attention_plugin=float16 --use_gemm_plugin=float16 --max_batch_size=32 --max_input_len=1919 --max_output_len=128 --vocab_size=50401 --max_beam_width=4 --output_dir=./build/engines/K905_A100X2/gptj/Offline --model_dir=build/models/GPTJ-6B/checkpoint-final --enable_context_fmha --enable_two_optimization_profiles Process Process-1: Traceback (most recent call last): File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/work/code/actionhandler/base.py", line 189, in subprocess_target return self.action_handler.handle() File "/work/code/actionhandler/generate_engines.py", line 175, in handle total_engine_build_time += self.build_engine(job) File "/work/code/actionhandler/generate_engines.py", line 166, in build_engine builder.build_engines() File "/work/code/gptj/tensorrt/gptj6b.py", line 115, in build_engines raise RuntimeError(f"Engine build fails! stderr: {ret.stderr}. See engine log: {stdout_fn} and {stderr_fn}") RuntimeError: Engine build fails! stderr: [01/22/2024-10:34:10] [TRT-LLM] [I] Loading HF GPTJ model from build/models/GPTJ-6B/checkpoint-final...

Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s] Loading checkpoint shards: 33%|███▎ | 1/3 [00:02<00:05, 2.71s/it] Loading checkpoint shards: 33%|███▎ | 1/3 [00:02<00:05, 2.71s/it] Traceback (most recent call last): File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 460, in load_state_dict return torch.load(checkpoint_file, map_location="cpu") File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 868, in load with _open_zipfile_reader(opened_file) as opened_zipfile: File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 333, in init super().init(torch._C.PyTorchFileReader(name_or_buffer)) RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 464, in load_state_dict if f.read(7) == "version": File "/usr/lib/python3.8/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 128: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "build/TRTLLM/examples/gptj/build.py", line 473, in args = parse_arguments() File "build/TRTLLM/examples/gptj/build.py", line 146, in parse_arguments hf_gpt = AutoModelForCausalLM.from_pretrained(args.model_dir) File "/home/test/.local/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 493, in from_pretrained return model_class.from_pretrained( File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2903, in from_pretrained ) = cls._load_pretrained_model( File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 3246, in _load_pretrained_model state_dict = load_state_dict(shard_file) File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 476, in load_state_dict raise OSError( OSError: Unable to load weights from pytorch checkpoint file for 'build/models/GPTJ-6B/checkpoint-final/pytorch_model-00002-of-00003.bin' at 'build/models/GPTJ-6B/checkpoint-final/pytorch_model-00002-of-00003.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True. . See engine log: ./build/engines/K905_A100X2/gptj/Offline/gptj-Offline-gpu-b32-fp16.custom_k_99_MaxP.stdout and ./build/engines/K905_A100X2/gptj/Offline/gptj-Offline-gpu-b32-fp16.custom_k_99_MaxP.stderr [2024-01-22 10:34:40,406 generate_engines.py:172 INFO] Building engines for gptj benchmark in Offline scenario... [01/22/2024-10:34:40] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 43, GPU 874 (MiB) [01/22/2024-10:34:46] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1957, GPU +346, now: CPU 2105, GPU 1220 (MiB) [2024-01-22 10:34:47,175 gptj6b.py:103 INFO] Building GPTJ engine in ./build/engines/K905_A100X2/gptj/Offline, use_fp8: False command: python build/TRTLLM/examples/gptj/build.py --dtype=float16 --use_gpt_attention_plugin=float16 --use_gemm_plugin=float16 --max_batch_size=32 --max_input_len=1919 --max_output_len=128 --vocab_size=50401 --max_beam_width=4 --output_dir=./build/engines/K905_A100X2/gptj/Offline --model_dir=build/models/GPTJ-6B/checkpoint-final --enable_context_fmha --enable_two_optimization_profiles Process Process-2: Traceback (most recent call last): File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/work/code/actionhandler/base.py", line 189, in subprocess_target return self.action_handler.handle() File "/work/code/actionhandler/generate_engines.py", line 175, in handle total_engine_build_time += self.build_engine(job) File "/work/code/actionhandler/generate_engines.py", line 166, in build_engine builder.build_engines() File "/work/code/gptj/tensorrt/gptj6b.py", line 115, in build_engines raise RuntimeError(f"Engine build fails! stderr: {ret.stderr}. See engine log: {stdout_fn} and {stderr_fn}") RuntimeError: Engine build fails! stderr: [01/22/2024-10:34:48] [TRT-LLM] [I] Loading HF GPTJ model from build/models/GPTJ-6B/checkpoint-final...

Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s] Loading checkpoint shards: 33%|███▎ | 1/3 [00:02<00:05, 2.90s/it] Loading checkpoint shards: 33%|███▎ | 1/3 [00:02<00:05, 2.90s/it] Traceback (most recent call last): File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 460, in load_state_dict return torch.load(checkpoint_file, map_location="cpu") File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 868, in load with _open_zipfile_reader(opened_file) as opened_zipfile: File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 333, in init super().init(torch._C.PyTorchFileReader(name_or_buffer)) RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 464, in load_state_dict if f.read(7) == "version": File "/usr/lib/python3.8/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 128: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "build/TRTLLM/examples/gptj/build.py", line 473, in args = parse_arguments() File "build/TRTLLM/examples/gptj/build.py", line 146, in parse_arguments hf_gpt = AutoModelForCausalLM.from_pretrained(args.model_dir) File "/home/test/.local/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 493, in from_pretrained return model_class.from_pretrained( File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2903, in from_pretrained ) = cls._load_pretrained_model( File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 3246, in _load_pretrained_model state_dict = load_state_dict(shard_file) File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 476, in load_state_dict raise OSError( OSError: Unable to load weights from pytorch checkpoint file for 'build/models/GPTJ-6B/checkpoint-final/pytorch_model-00002-of-00003.bin' at 'build/models/GPTJ-6B/checkpoint-final/pytorch_model-00002-of-00003.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True. . See engine log: ./build/engines/K905_A100X2/gptj/Offline/gptj-Offline-gpu-b32-fp16.custom_k_99_MaxP.stdout and ./build/engines/K905_A100X2/gptj/Offline/gptj-Offline-gpu-b32-fp16.custom_k_99_MaxP.stderr Traceback (most recent call last): File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/work/code/main.py", line 232, in main(main_args, DETECTED_SYSTEM) File "/work/code/main.py", line 145, in main dispatch_action(main_args, config_dict, workload_setting) File "/work/code/main.py", line 203, in dispatch_action handler.run() File "/work/code/actionhandler/base.py", line 82, in run self.handle_failure() File "/work/code/actionhandler/base.py", line 186, in handle_failure self.action_handler.handle_failure() File "/work/code/actionhandler/generate_engines.py", line 183, in handle_failure raise RuntimeError("Building engines failed!") RuntimeError: Building engines failed! make[1]: [Makefile:37: generate_engines] Error 1 make[1]: Leaving directory '/work' make: [Makefile:31: run] Error 2 (mlperf) test@mlperf-inference-test-x86-64-7440:/work$

lapp0 commented 7 months ago

Try downgrading transformers to 4.36.2

psyhtest commented 7 months ago

@ChristinaHsu0115 Please consider renaming the issue. AMD did not submit to v3.1. You are using NVIDIA's code.

/cc @nv-ananjappa @mrmhodak

ChristinaHsu0115 commented 7 months ago

@lapp0 Thanks for help. I dont know how to update transformers to 4.36.2 exactly. It had lots of dependiency with fsspec, tdqm, huggingface.... so i change two step as below:

  1. I download pytorch_model bin file from another site (Note. make download_model BENCHMARKS="gpt" ->The file of pytorch had split 3 bin. And 2nd of bin file is broken.)
  2. modify the "ignore_mismatched_sizes = True" on pretrained. 1 So its able to run gptj benchmark. But I got another problem as below: Does anyone know how to fix the problem?

(mlperf) jay@mlperf-inference-jay-x86-64-19218:/work$ make run RUN_ARGS="--benchmarks=gptj --scenarios=offline" make[1]: Entering directory '/work' [2024-01-24 12:17:41,391 main.py:230 INFO] Detected system ID: KnownSystem.k905_h100_x2 [2024-01-24 12:17:43,151 generate_engines.py:172 INFO] Building engines for gptj benchmark in Offline scenario... [01/24/2024-12:17:43] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 44, GPU 942 (MiB) [01/24/2024-12:17:50] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +4333, GPU +1150, now: CPU 4482, GPU 2094 (MiB) [2024-01-24 12:17:51,765 gptj6b.py:103 INFO] Building GPTJ engine in ./build/engines/k905_h100_x2/gptj/Offline, use_fp8: False command: python build/TRTLLM/examples/gptj/build.py --dtype=float16 --use_gpt_attention_plugin=float16 --use_gemm_plugin=float16 --max_batch_size=32 --max_input_len=1919 --max_output_len=128 --vocab_size=50401 --max_beam_width=4 --output_dir=./build/engines/k905_h100_x2/gptj/Offline --model_dir=build/models/GPTJ-6B/checkpoint-final --enable_context_fmha --enable_two_optimization_profiles [2024-01-24 12:20:20,141 gptj6b.py:122 INFO] Engine built complete and took 148.37598872184753s. Stored at ./build/engines/k905_h100_x2/gptj/Offline/gptj-Offline-gpu-b32-fp16.custom_k_99_MaxP.plan [2024-01-24 12:20:20,141 generate_engines.py:176 INFO] Finished building engines for gptj benchmark in Offline scenario. Time taken to generate engines: 156.99001169204712 seconds make[1]: Leaving directory '/work' make[1]: Entering directory '/work' [2024-01-24 12:20:25,648 main.py:230 INFO] Detected system ID: KnownSystem.k905_h100_x2 [2024-01-24 12:20:25,751 harness.py:236 INFO] The harness will load 1 plugins: ['build/plugins/../TRTLLM/cpp/build/tensorrt_llm/plugins/libnvinfer_plugin.so'] [2024-01-24 12:20:25,751 generate_conf_files.py:107 INFO] Generated measurements/ entries for k905_h100_x2_TRT/gptj-99/Offline [2024-01-24 12:20:25,752 init.py:46 INFO] Running command: ./build/bin/harness_gpt --plugins="build/plugins/../TRTLLM/cpp/build/tensorrt_llm/plugins/libnvinfer_plugin.so" --logfile_outdir="/work/build/logs/2024.01.24-12.17.38/k905_h100_x2_TRT/gptj-99/Offline" --logfile_prefix="mlperflog" --performance_sample_count=13368 --gpu_batch_size=32 --tensor_path="build/preprocessed_data/cnn_dailymail_tokenized_gptj/input_ids_padded.npy,build/preprocessed_data/cnn_dailymail_tokenized_gptj/masked_tokens.npy,build/preprocessed_data/cnn_dailymail_tokenized_gptj/input_lengths.npy" --use_graphs=false --gpu_inference_streams=1 --gpu_copy_streams=1 --tensor_parallelism=1 --enable_sort=true --num_sort_segments=2 --gpu_engines="./build/engines/k905_h100_x2/gptj/Offline/gptj-Offline-gpu-b32-fp16.custom_k_99_MaxP.plan" --mlperf_conf_path="build/loadgen-configs/k905_h100_x2_TRT/gptj-99/Offline/mlperf.conf" --user_conf_path="build/loadgen-configs/k905_h100_x2_TRT/gptj-99/Offline/user.conf" --scenario Offline --model gptj [2024-01-24 12:20:25,752 init.py:53 INFO] Overriding Environment benchmark : Benchmark.GPTJ buffer_manager_thread_count : 0 coalesced_tensor : True data_dir : /home/jay/inference_results_v3.1/closed/NVIDIA/scratch//data enable_sort : True gpu_batch_size : 32 gpu_copy_streams : 1 gpu_inference_streams : 1 input_dtype : int32 input_format : linear log_dir : /work/build/logs/2024.01.24-12.17.38 num_sort_segments : 2 offline_expected_qps : 76 precision : fp16 preprocessed_data_dir : /home/jay/inference_results_v3.1/closed/NVIDIA/scratch//preprocessed_data scenario : Scenario.Offline system : SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='AMD EPYC 9654 96-Core Processor', architecture=<CPUArchitecture.x86_64: AliasedName(name='x86_64', aliases=(), patterns=())>, core_count=96, threads_per_core=2): 2}), host_mem_conf=MemoryConfiguration(host_memory_capacity=Memory(quantity=1.5849335560000002, byte_suffix=<ByteSuffix.TB: (1000, 4)>, _num_bytes=1584933556000), comparison_tolerance=0.05), accelerator_conf=AcceleratorConfiguration(layout=defaultdict(<class 'int'>, {GPU(name='NVIDIA H100 PCIe', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=79.6474609375, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=85520809984), max_power_limit=350.0, pci_id='0x233110DE', compute_sm=90): 2})), numa_conf=NUMAConfiguration(numa_nodes={}, num_numa_nodes=2), system_id='k905_h100_x2') tensor_parallelism : 1 tensor_path : build/preprocessed_data/cnn_dailymail_tokenized_gptj/input_ids_padded.npy,build/preprocessed_data/cnn_dailymail_tokenized_gptj/masked_tokens.npy,build/preprocessed_data/cnn_dailymail_tokenized_gptj/input_lengths.npy use_graphs : False system_id : k905_h100_x2 config_name : k905_h100_x2_gptj_Offline workload_setting : WorkloadSetting(HarnessType.Custom, AccuracyTarget.k_99, PowerSetting.MaxP) optimization_level : plugin-enabled use_cpu : False use_inferentia : False num_profiles : 1 config_ver : custom_k_99_MaxP accuracy_level : 99% inference_server : custom skip_file_checks : False power_limit : None cpu_freq : None &&&& RUNNING GPT_HARNESS # ./build/bin/harness_gpt [I] Loading plugin: build/plugins/../TRTLLM/cpp/build/tensorrt_llm/plugins/libnvinfer_plugin.so I0124 12:20:26.327747 13788 main_gpt.cc:122] Found 2 GPUs I0124 12:20:27.282594 13788 gpt_server.cc:215] Loading 1 engine(s) I0124 12:20:27.282637 13788 gpt_server.cc:218] Engine Path: ./build/engines/k905_h100_x2/gptj/Offline/gptj-Offline-gpu-b32-fp16.custom_k_99_MaxP.plan [I] [TRT] Loaded engine size: 11546 MiB [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +66, now: CPU 35086, GPU 12554 (MiB) [I] [TRT] [MemUsageChange] Init cuDNN: CPU +2, GPU +72, now: CPU 35088, GPU 12626 (MiB) [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +11541, now: CPU 0, GPU 11541 (MiB) [I] [TRT] Loaded engine size: 11546 MiB [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +7, GPU +66, now: CPU 23982, GPU 12093 (MiB) [I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +72, now: CPU 23983, GPU 12165 (MiB) [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +11541, now: CPU 0, GPU 23082 (MiB) I0124 12:20:40.118860 13788 gpt_server.cc:290] Engines Deserialization Completed I0124 12:20:40.366228 13788 gpt_core.cc:64] GPTCore 0: MPI Rank - 0 at Device Id - 0 I0124 12:20:40.366343 13788 gpt_core.cc:262] Engine - Vocab size: 50401 Padded vocab size: 50401 Beam width: 4 I0124 12:20:40.369578 13788 gpt_core.cc:90] Engine - Device Memory requirements: 6539709440 I0124 12:20:40.369586 13788 gpt_core.cc:99] Engine - Total Number of Optimization Profiles: 2 I0124 12:20:40.369588 13788 gpt_core.cc:100] Engine - Number of Optimization Profiles Per Core: 2 I0124 12:20:40.369591 13788 gpt_core.cc:101] Engine - Start Index of Optimization Profiles: 0 [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +64, now: CPU 893, GPU 18868 (MiB) [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +64, now: CPU 893, GPU 18932 (MiB) I0124 12:20:40.602331 13788 gpt_core.cc:115] Setting Opt.Prof. to 0 [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 23082 (MiB) [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +64, now: CPU 930, GPU 19032 (MiB) [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +72, now: CPU 930, GPU 19104 (MiB) I0124 12:20:40.817628 13788 gpt_core.cc:115] Setting Opt.Prof. to 1 [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 23082 (MiB) [I] [TRT] Switching optimization profile from: 0 to 1. Please ensure there are no enqueued operations pending in this context prior to switching profiles terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc [mlperf-inference-jay-x86-64-19218:13788] Process received signal [mlperf-inference-jay-x86-64-19218:13788] Signal: Aborted (6) [mlperf-inference-jay-x86-64-19218:13788] Signal code: (-6) [mlperf-inference-jay-x86-64-19218:13788] [ 0] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f0c5c775420] [mlperf-inference-jay-x86-64-19218:13788] [ 1] /usr/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f0c5c26400b] [mlperf-inference-jay-x86-64-19218:13788] [ 2] /usr/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f0c5c243859] [mlperf-inference-jay-x86-64-19218:13788] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x9e8d1)[0x7f0c5c61b8d1] [mlperf-inference-jay-x86-64-19218:13788] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa37c)[0x7f0c5c62737c] [mlperf-inference-jay-x86-64-19218:13788] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa3e7)[0x7f0c5c6273e7] [mlperf-inference-jay-x86-64-19218:13788] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(cxa_rethrow+0x4d)[0x7f0c5c6276ed] [mlperf-inference-jay-x86-64-19218:13788] [ 7] ./build/bin/harness_gpt(+0x715c1)[0x564f8dfb35c1] [mlperf-inference-jay-x86-64-19218:13788] [ 8] ./build/bin/harness_gpt(+0x6b45b)[0x564f8dfad45b] [mlperf-inference-jay-x86-64-19218:13788] [ 9] ./build/bin/harness_gpt(+0x5d0fe)[0x564f8df9f0fe] [mlperf-inference-jay-x86-64-19218:13788] [10] ./build/bin/harness_gpt(+0x2fc84)[0x564f8df71c84] [mlperf-inference-jay-x86-64-19218:13788] [11] /usr/lib/x86_64-linux-gnu/libc.so.6(libc_start_main+0xf3)[0x7f0c5c245083] [mlperf-inference-jay-x86-64-19218:13788] [12] ./build/bin/harness_gpt(+0x3074e)[0x564f8df7274e] [mlperf-inference-jay-x86-64-19218:13788] End of error message Aborted (core dumped) Traceback (most recent call last): File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/work/code/main.py", line 232, in main(main_args, DETECTED_SYSTEM) File "/work/code/main.py", line 145, in main dispatch_action(main_args, config_dict, workload_setting) File "/work/code/main.py", line 203, in dispatch_action handler.run() File "/work/code/actionhandler/base.py", line 82, in run self.handle_failure() File "/work/code/actionhandler/run_harness.py", line 193, in handle_failure raise RuntimeError("Run harness failed!") RuntimeError: Run harness failed! Traceback (most recent call last): File "/work/code/actionhandler/run_harness.py", line 162, in handle result_data = self.harness.run_harness(flag_dict=self.harness_flag_dict, skip_generate_measurements=True) File "/work/code/common/harness.py", line 339, in run_harness output = run_command(self._construct_terminal_command(argstr), get_output=True, custom_env=self.env_vars) File "/work/code/common/init.py", line 67, in run_command raise subprocess.CalledProcessError(ret, cmd) subprocess.CalledProcessError: Command './build/bin/harness_gpt --plugins="build/plugins/../TRTLLM/cpp/build/tensorrt_llm/plugins/libnvinfer_plugin.so" --logfile_outdir="/work/build/logs/2024.01.24-12.17.38/k905_h100_x2_TRT/gptj-99/Offline" --logfile_prefix="mlperflog" --performance_sample_count=13368 --gpu_batch_size=32 --tensor_path="build/preprocessed_data/cnn_dailymail_tokenized_gptj/input_ids_padded.npy,build/preprocessed_data/cnn_dailymail_tokenized_gptj/masked_tokens.npy,build/preprocessed_data/cnn_dailymail_tokenized_gptj/input_lengths.npy" --use_graphs=false --gpu_inference_streams=1 --gpu_copy_streams=1 --tensor_parallelism=1 --enable_sort=true --num_sort_segments=2 --gpu_engines="./build/engines/k905_h100_x2/gptj/Offline/gptj-Offline-gpu-b32-fp16.custom_k_99_MaxP.plan" --mlperf_conf_path="build/loadgen-configs/k905_h100_x2_TRT/gptj-99/Offline/mlperf.conf" --user_conf_path="build/loadgen-configs/k905_h100_x2_TRT/gptj-99/Offline/user.conf" --scenario Offline --model gptj' returned non-zero exit status 134. make[1]: [Makefile:45: run_harness] Error 1 make[1]: Leaving directory '/work' make: [Makefile:32: run] Error 2 (mlperf) jay@mlperf-inference-jay-x86-64-19218:/work$

ChristinaHsu0115 commented 7 months ago

The issue had been solved when modifed the parameter gpubatch size parameter on custom.py. Its able to run gptj benchmark. Thanks to all. chrome_2024-01-25_14-56-35