Problems getting ResNet50 to run in CM

WarrenSchultz commented 5 months ago

I have tried a variety of combinations of CUDA, CuDNN, and TensorRT versions, using a variety of techniques when trying to run ResNet 50 v1.5 using CM on Ubuntu/WSL2/Windows 11/CUDA.

Most recently, I've been using this for reference, but still having problems, even using containers.

The failure condition I most commonly get follows: Traceback (most recent call last): File "/home/cmuser/.local/bin/cm", line 8, in <module> sys.exit(run()) File "/home/cmuser/.local/lib/python3.10/site-packages/cmind/cli.py", line 37, in run r = cm.access(argv, out='con') File "/home/cmuser/.local/lib/python3.10/site-packages/cmind/core.py", line 602, in access r = action_addr(i) File "/home/cmuser/CM/repos/mlcommons@cm4mlops/automation/script/module.py", line 211, in run r = self._run(i) File "/home/cmuser/CM/repos/mlcommons@cm4mlops/automation/script/module.py", line 1485, in _run r = customize_code.preprocess(ii) File "/home/cmuser/CM/repos/mlcommons@cm4mlops/script/run-mlperf-inference-app/customize.py", line 215, in preprocess r = cm.access(ii) File "/home/cmuser/.local/lib/python3.10/site-packages/cmind/core.py", line 758, in access return cm.access(i) File "/home/cmuser/.local/lib/python3.10/site-packages/cmind/core.py", line 602, in access r = action_addr(i) File "/home/cmuser/CM/repos/mlcommons@cm4mlops/automation/script/module.py", line 4067, in docker return utils.call_internal_module(self, __file__, 'module_misc', 'docker', i) File "/home/cmuser/.local/lib/python3.10/site-packages/cmind/utils.py", line 1631, in call_internal_module return getattr(tmp_module, module_func)(i) File "/home/cmuser/CM/repos/mlcommons@cm4mlops/automation/script/module_misc.py", line 1772, in docker r = script_automation._run_deps(deps, [], env, {}, {}, {}, {}, '', {}, '', False, '', verbose, show_time, ' ', run_state) File "/home/cmuser/CM/repos/mlcommons@cm4mlops/automation/script/module.py", line 3038, in _run_deps r = self.cmind.access(ii) File "/home/cmuser/.local/lib/python3.10/site-packages/cmind/core.py", line 602, in access r = action_addr(i) File "/home/cmuser/CM/repos/mlcommons@cm4mlops/automation/script/module.py", line 211, in run r = self._run(i) File "/home/cmuser/CM/repos/mlcommons@cm4mlops/automation/script/module.py", line 727, in _run remembered_selections.append({'type': 'script', AttributeError: 'dict' object has no attribute 'append'

arjunsuresh commented 5 months ago

Hi @WarrenSchultz For reproducing mlperf inference 3.1 and later submissions, we use Nvidia docker and so there is no need to manually add cuDNN and TensorRT. Please do the below commands

cm rm repo mlcommons@ck
cm pull repo gateoverflow@cm4mlops
cm docker script --tags=build,nvidia,inference,server --docker_cache=no --docker_cm_repo=gateoverflow@cm4mlops

After this you should be inside the container in interactive mode and you can do the cm inference commands.

WarrenSchultz commented 5 months ago

Thanks, I'll give that a shot. I thought I tried that already, but it's possible I missed something.

arjunsuresh commented 5 months ago

You're welcome @WarrenSchultz I believe you're facing the error because you're using the main branch of cm4mlops and not the mlperf-inference branch. The above fork is to that branch as CM for now is not supporting branch while pulling repos.

WarrenSchultz commented 5 months ago

@arjunsuresh Hm. Still having issues, am I using the wrong command lines? 4.0 cm run script --tags=run-mlperf,inference,_r4.0,_performance-only,_short --division=open --category=edge --device=cuda --model=resnet50 --precision=float32 --implementation=nvidia --backend=tensorrt --scenario=Offline --execution_mode=test --power=no --adr.python.version_min=3.8 --clean --compliance=no --quiet --time

[2024-05-14 11:00:48,546 preprocess_data.py:124 INFO] Preprocessing done.
Finished preprocessing all the datasets!
             ! call "postprocess" from /home/cmuser/CM/repos/gateoverflow@cm4mlops/script/app-mlperf-inference-nvidia/customize.py

      * cm run script "save mlperf inference state"
             ! call "postprocess" from /home/cmuser/CM/repos/gateoverflow@cm4mlops/script/save-mlperf-inference-implementation-state/customize.py

      * cm run script "get generic-python-lib _onnx-graphsurgeon"
           ! load /home/cmuser/CM/repos/local/cache/6ae2617594f84f4f/cm-cached-state.json

      * cm run script "get generic-python-lib _package.onnx"
           ! load /home/cmuser/CM/repos/local/cache/238f26c6bf6a4429/cm-cached-state.json
           ! cd /home/cmuser/CM/repos/local/cache/734eb89ff5f44166
           ! call /home/cmuser/CM/repos/gateoverflow@cm4mlops/script/app-mlperf-inference-nvidia/run.sh from tmp-run.sh
make generate_engines RUN_ARGS=' --benchmarks=resnet50 --scenarios=offline  --test_mode=PerformanceOnly  --gpu_batch_size=64 --no_audit_verify  '
[2024-05-14 11:00:53,958 main.py:230 INFO] Detected system ID: KnownSystem.hpsut
[2024-05-14 11:00:54,751 generate_engines.py:172 INFO] Building engines for resnet50 benchmark in Offline scenario...
[05/14/2024-11:00:54] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 39, GPU 1183 (MiB)
[05/14/2024-11:01:00] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1809, GPU +314, now: CPU 1953, GPU 1497 (MiB)
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/cmuser/CM/repos/local/cache/a692a0114af64a10/repo/closed/NVIDIA/code/actionhandler/base.py", line 189, in subprocess_target
    return self.action_handler.handle()
  File "/home/cmuser/CM/repos/local/cache/a692a0114af64a10/repo/closed/NVIDIA/code/actionhandler/generate_engines.py", line 175, in handle
    total_engine_build_time += self.build_engine(job)
  File "/home/cmuser/CM/repos/local/cache/a692a0114af64a10/repo/closed/NVIDIA/code/actionhandler/generate_engines.py", line 166, in build_engine
    builder.build_engines()
  File "/home/cmuser/.local/lib/python3.8/site-packages/nvmitten/nvidia/builder.py", line 512, in build_engines
    self.mitten_builder.run(self.legacy_scratch, None)
  File "/home/cmuser/CM/repos/local/cache/a692a0114af64a10/repo/closed/NVIDIA/code/resnet50/tensorrt/ResNet50.py", line 315, in run
    network = self.create_network(self.builder, subnetwork_name=subnet_name)
  File "/home/cmuser/CM/repos/local/cache/a692a0114af64a10/repo/closed/NVIDIA/code/resnet50/tensorrt/ResNet50.py", line 177, in create_network
    rn50_gs = RN50GraphSurgeon(self.model_path,
  File "/home/cmuser/CM/repos/local/cache/a692a0114af64a10/repo/closed/NVIDIA/code/resnet50/tensorrt/rn50_graphsurgeon.py", line 232, in __init__
    super().__init__(onnx_path,
  File "/home/cmuser/.local/lib/python3.8/site-packages/nvmitten/nvidia/builder.py", line 114, in __init__
    self.graph = self.import_onnx(onnx_path)
  File "/home/cmuser/.local/lib/python3.8/site-packages/nvmitten/nvidia/builder.py", line 145, in import_onnx
    return gs.import_onnx(model)
  File "/home/cmuser/.local/lib/python3.8/site-packages/onnx_graphsurgeon/importers/onnx_importer.py", line 508, in import_onnx
    return OnnxImporter.import_graph(
  File "/home/cmuser/.local/lib/python3.8/site-packages/onnx_graphsurgeon/importers/onnx_importer.py", line 430, in import_graph
    get_tensor(initializer)
  File "/home/cmuser/.local/lib/python3.8/site-packages/onnx_graphsurgeon/importers/onnx_importer.py", line 424, in get_tensor
    subgraph_tensor_map[name] = OnnxImporter.import_tensor(onnx_tensor)
  File "/home/cmuser/.local/lib/python3.8/site-packages/onnx_graphsurgeon/importers/onnx_importer.py", line 213, in import_tensor
    return Constant(name=onnx_tensor.name, values=LazyValues(onnx_tensor), data_location=data_location)
  File "/home/cmuser/.local/lib/python3.8/site-packages/onnx_graphsurgeon/ir/tensor.py", line 211, in __init__
    self.dtype = get_onnx_tensor_dtype(self.tensor)
  File "/home/cmuser/.local/lib/python3.8/site-packages/onnx_graphsurgeon/importers/onnx_importer.py", line 129, in get_onnx_tensor_dtype
    dtype = get_numpy_type(onnx_dtype)
  File "/home/cmuser/.local/lib/python3.8/site-packages/onnx_graphsurgeon/importers/onnx_importer.py", line 95, in get_numpy_type
    onnx.TensorProto.FLOAT8E4M3FN,
AttributeError: FLOAT8E4M3FN
[2024-05-14 11:01:01,991 generate_engines.py:172 INFO] Building engines for resnet50 benchmark in Offline scenario...
[05/14/2024-11:01:01] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 39, GPU 1183 (MiB)
[05/14/2024-11:01:07] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1809, GPU +314, now: CPU 1953, GPU 1497 (MiB)
Process Process-2:
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/cmuser/CM/repos/local/cache/a692a0114af64a10/repo/closed/NVIDIA/code/actionhandler/base.py", line 189, in subprocess_target
    return self.action_handler.handle()
  File "/home/cmuser/CM/repos/local/cache/a692a0114af64a10/repo/closed/NVIDIA/code/actionhandler/generate_engines.py", line 175, in handle
    total_engine_build_time += self.build_engine(job)
  File "/home/cmuser/CM/repos/local/cache/a692a0114af64a10/repo/closed/NVIDIA/code/actionhandler/generate_engines.py", line 166, in build_engine
    builder.build_engines()
  File "/home/cmuser/.local/lib/python3.8/site-packages/nvmitten/nvidia/builder.py", line 512, in build_engines
    self.mitten_builder.run(self.legacy_scratch, None)
  File "/home/cmuser/CM/repos/local/cache/a692a0114af64a10/repo/closed/NVIDIA/code/resnet50/tensorrt/ResNet50.py", line 315, in run
    network = self.create_network(self.builder, subnetwork_name=subnet_name)
  File "/home/cmuser/CM/repos/local/cache/a692a0114af64a10/repo/closed/NVIDIA/code/resnet50/tensorrt/ResNet50.py", line 177, in create_network
    rn50_gs = RN50GraphSurgeon(self.model_path,
  File "/home/cmuser/CM/repos/local/cache/a692a0114af64a10/repo/closed/NVIDIA/code/resnet50/tensorrt/rn50_graphsurgeon.py", line 232, in __init__
    super().__init__(onnx_path,
  File "/home/cmuser/.local/lib/python3.8/site-packages/nvmitten/nvidia/builder.py", line 114, in __init__
    self.graph = self.import_onnx(onnx_path)
  File "/home/cmuser/.local/lib/python3.8/site-packages/nvmitten/nvidia/builder.py", line 145, in import_onnx
    return gs.import_onnx(model)
  File "/home/cmuser/.local/lib/python3.8/site-packages/onnx_graphsurgeon/importers/onnx_importer.py", line 508, in import_onnx
    return OnnxImporter.import_graph(
  File "/home/cmuser/.local/lib/python3.8/site-packages/onnx_graphsurgeon/importers/onnx_importer.py", line 430, in import_graph
    get_tensor(initializer)
  File "/home/cmuser/.local/lib/python3.8/site-packages/onnx_graphsurgeon/importers/onnx_importer.py", line 424, in get_tensor
    subgraph_tensor_map[name] = OnnxImporter.import_tensor(onnx_tensor)
  File "/home/cmuser/.local/lib/python3.8/site-packages/onnx_graphsurgeon/importers/onnx_importer.py", line 213, in import_tensor
    return Constant(name=onnx_tensor.name, values=LazyValues(onnx_tensor), data_location=data_location)
  File "/home/cmuser/.local/lib/python3.8/site-packages/onnx_graphsurgeon/ir/tensor.py", line 211, in __init__
    self.dtype = get_onnx_tensor_dtype(self.tensor)
  File "/home/cmuser/.local/lib/python3.8/site-packages/onnx_graphsurgeon/importers/onnx_importer.py", line 129, in get_onnx_tensor_dtype
    dtype = get_numpy_type(onnx_dtype)
  File "/home/cmuser/.local/lib/python3.8/site-packages/onnx_graphsurgeon/importers/onnx_importer.py", line 95, in get_numpy_type
    onnx.TensorProto.FLOAT8E4M3FN,
AttributeError: FLOAT8E4M3FN
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/cmuser/CM/repos/local/cache/a692a0114af64a10/repo/closed/NVIDIA/code/main.py", line 232, in <module>
    main(main_args, DETECTED_SYSTEM)
  File "/home/cmuser/CM/repos/local/cache/a692a0114af64a10/repo/closed/NVIDIA/code/main.py", line 145, in main
    dispatch_action(main_args, config_dict, workload_setting)
  File "/home/cmuser/CM/repos/local/cache/a692a0114af64a10/repo/closed/NVIDIA/code/main.py", line 203, in dispatch_action
    handler.run()
  File "/home/cmuser/CM/repos/local/cache/a692a0114af64a10/repo/closed/NVIDIA/code/actionhandler/base.py", line 82, in run
    self.handle_failure()
  File "/home/cmuser/CM/repos/local/cache/a692a0114af64a10/repo/closed/NVIDIA/code/actionhandler/base.py", line 186, in handle_failure
    self.action_handler.handle_failure()
  File "/home/cmuser/CM/repos/local/cache/a692a0114af64a10/repo/closed/NVIDIA/code/actionhandler/generate_engines.py", line 183, in handle_failure
    raise RuntimeError("Building engines failed!")
RuntimeError: Building engines failed!
make: *** [Makefile:37: generate_engines] Error 1

CM error: Portable CM script failed (name = app-mlperf-inference-nvidia, return code = 256)

3.1 cm run script --tags=run-mlperf,inference,_r3.1,_performance-only,_short --division=open --category=edge --device=cuda --model=resnet50 --precision=float32 --implementation=nvidia --backend=tensorrt --scenario=Offline --execution_mode=test --power=no --adr.python.version_min=3.8 --clean --compliance=no --quiet --time

`[ 34%] Linking CXX static library libtritontableprinter.a cd /home/cmuser/CM/repos/local/cache/4a2b1abf4b974cb3/repo/closed/NVIDIA/build/triton-inference-server/out/tritonserver/build/triton-server/_deps/repo-common-build && /home/cmuser/CM/repos/local/cache/bdabc1e4cac745e0/bin/cmake -P CMakeFiles/triton-common-table-printer.dir/cmake_clean_target.cmake cd /home/cmuser/CM/repos/local/cache/4a2b1abf4b974cb3/repo/closed/NVIDIA/build/triton-inference-server/out/tritonserver/build/triton-server/_deps/repo-common-build && /home/cmuser/CM/repos/local/cache/bdabc1e4cac745e0/bin/cmake -E cmake_link_script CMakeFiles/triton-common-table-printer.dir/link.txt --verbose=0 make[7]: Leaving directory '/home/cmuser/CM/repos/local/cache/4a2b1abf4b974cb3/repo/closed/NVIDIA/build/triton-inference-server/out/tritonserver/build/triton-server' [ 34%] Built target triton-common-table-printer make[6]: Leaving directory '/home/cmuser/CM/repos/local/cache/4a2b1abf4b974cb3/repo/closed/NVIDIA/build/triton-inference-server/out/tritonserver/build/triton-server' make[5]: [Makefile:136: all] Error 2 make[5]: Leaving directory '/home/cmuser/CM/repos/local/cache/4a2b1abf4b974cb3/repo/closed/NVIDIA/build/triton-inference-server/out/tritonserver/build/triton-server' make[4]: [CMakeFiles/triton-server.dir/build.make:86: triton-server/src/triton-server-stamp/triton-server-build] Error 2 make[4]: Leaving directory '/home/cmuser/CM/repos/local/cache/4a2b1abf4b974cb3/repo/closed/NVIDIA/build/triton-inference-server/out/tritonserver/build' make[3]: [CMakeFiles/Makefile2:137: CMakeFiles/triton-server.dir/all] Error 2 make[3]: Leaving directory '/home/cmuser/CM/repos/local/cache/4a2b1abf4b974cb3/repo/closed/NVIDIA/build/triton-inference-server/out/tritonserver/build' make[2]: [Makefile:136: all] Error 2 make[2]: Leaving directory '/home/cmuser/CM/repos/local/cache/4a2b1abf4b974cb3/repo/closed/NVIDIA/build/triton-inference-server/out/tritonserver/build' error: build failed make[1]: [Makefile.build:224: build_triton] Error 1 make[1]: Leaving directory '/home/cmuser/CM/repos/local/cache/4a2b1abf4b974cb3/repo/closed/NVIDIA' make: [/home/cmuser/CM/repos/local/cache/4a2b1abf4b974cb3/repo/closed/NVIDIA/Makefile.build:170: build] Error 2

CM error: Portable CM script failed (name = build-mlperf-inference-server-nvidia, return code = 256)`

Edit: This is running on Docker-CE on WSL2, using the Ubuntu 22.04 distro.

WarrenSchultz commented 5 months ago

One more data point. Running tensorflow in reference mode returns the following:

It says it's missing some GPU libraries, or is that an irrelevant message? But the numbers definitely look more like CPU than GPU results for a 4000 Ada.

`CM script::benchmark-program/run.sh

Run Directory: /home/cmuser/CM/repos/local/cache/d8038bec49c846bf/inference/vision/classification_and_detection

CMD: ./run_local.sh tf resnet50 gpu --scenario Offline --mlperf_conf '/home/cmuser/CM/repos/local/cache/d8038bec49c846bf/inference/mlperf.conf' --threads 2 --user_conf '/home/cmuser/CM/repos/gateoverflow@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/a22d72678c29414d9e035e295a6731d0.conf' --use_preprocessed_dataset --cache_dir /home/cmuser/CM/repos/local/cache/47c7051a9af44613 --dataset-list /home/cmuser/CM/repos/local/cache/ef9dd0d7a8e54e80/data/val.txt 2>&1 | tee /home/cmuser/test_results/8c668fa856f8-nvidia_original-gpu-tensorrt-vdefault-default_config/resnet50/offline/performance/run_1/console.out

     ! cd /home/cmuser
     ! call /home/cmuser/CM/repos/gateoverflow@cm4mlops/script/benchmark-program/run-ubuntu.sh from tmp-run.sh

python3 python/main.py --profile resnet50-tf --mlperf_conf ../../mlperf.conf --model "/home/cmuser/CM/repos/local/cache/28478cb4a4994866/resnet50_v1.pb" --dataset-path /home/cmuser/CM/repos/local/cache/47c7051a9af44613 --output "/home/cmuser/test_results/8c668fa856f8-nvidia_original-gpu-tensorrt-vdefault-default_config/resnet50/offline/performance/run_1" --scenario Offline --mlperf_conf /home/cmuser/CM/repos/local/cache/d8038bec49c846bf/inference/mlperf.conf --threads 2 --user_conf /home/cmuser/CM/repos/gateoverflow@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/a22d72678c29414d9e035e295a6731d0.conf --use_preprocessed_dataset --cache_dir /home/cmuser/CM/repos/local/cache/47c7051a9af44613 --dataset-list /home/cmuser/CM/repos/local/cache/ef9dd0d7a8e54e80/data/val.txt INFO:main:Namespace(accuracy=False, audit_conf='audit.config', backend='tensorflow', cache=0, cache_dir='/home/cmuser/CM/repos/local/cache/47c7051a9af44613', count=None, data_format=None, dataset='imagenet', dataset_list='/home/cmuser/CM/repos/local/cache/ef9dd0d7a8e54e80/data/val.txt', dataset_path='/home/cmuser/CM/repos/local/cache/47c7051a9af44613', debug=False, find_peak_performance=False, inputs=['input_tensor:0'], max_batchsize=32, max_latency=None, mlperf_conf='/home/cmuser/CM/repos/local/cache/d8038bec49c846bf/inference/mlperf.conf', model='/home/cmuser/CM/repos/local/cache/28478cb4a4994866/resnet50_v1.pb', model_name='resnet50', output='/home/cmuser/test_results/8c668fa856f8-nvidia_original-gpu-tensorrt-vdefault-default_config/resnet50/offline/performance/run_1', outputs=['ArgMax:0'], performance_sample_count=None, preprocessed_dir=None, profile='resnet50-tf', qps=None, samples_per_query=8, scenario='Offline', threads=2, time=None, use_preprocessed_dataset=True, user_conf='/home/cmuser/CM/repos/gateoverflow@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/a22d72678c29414d9e035e295a6731d0.conf') 2024-05-14 11:29:07.643748: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-05-14 11:29:07.645081: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used. 2024-05-14 11:29:07.668378: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used. 2024-05-14 11:29:07.668722: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-05-14 11:29:08.022557: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT INFO:imagenet:Loading 50000 preprocessed images using 2 threads INFO:imagenet:reduced image list, 49500 images not found INFO:imagenet:loaded 500 images, cache=0, already_preprocessed=True, took=1.4sec WARNING:tensorflow:From /home/cmuser/CM/repos/local/cache/d8038bec49c846bf/inference/vision/classification_and_detection/python/backend_tf.py:47: FastGFile.init (from tensorflow.python.platform.gfile) is deprecated and will be removed in a future version. Instructions for updating: Use tf.gfile.GFile. WARNING:tensorflow:From /home/cmuser/.local/lib/python3.8/site-packages/tensorflow/python/tools/strip_unused_lib.py:84: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version. Instructions for updating: This API was designed for TensorFlow v1. See https://www.tensorflow.org/guide/migrate for instructions on how to migrate your code to TensorFlow v2. WARNING:tensorflow:From /home/cmuser/.local/lib/python3.8/site-packages/tensorflow/python/tools/optimize_for_inference_lib.py:112: remove_training_nodes (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version. Instructions for updating: This API was designed for TensorFlow v1. See https://www.tensorflow.org/guide/migrate for instructions on how to migrate your code to TensorFlow v2. 2024-05-14 11:29:35.874583: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2024-05-14 11:29:35.965735: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1960] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices... 2024-05-14 11:29:36.106365: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:375] MLIR V1 optimization pass is not enabled INFO:main:starting TestScenario.Offline TestScenario.Offline qps=8.42, mean=0.0914, time=0.119, queries=1, tiles=50.0:0.0914,80.0:0.0914,90.0:0.0914,95.0:0.0914,99.0:0.0914,99.9:0.0914 ! call "postprocess" from /home/cmuser/CM/repos/gateoverflow@cm4mlops/script/benchmark-program/customize.py

cm run script "save mlperf inference state" ! call "postprocess" from /home/cmuser/CM/repos/gateoverflow@cm4mlops/script/save-mlperf-inference-implementation-state/customize.py ! cd /home/cmuser ! call /home/cmuser/CM/repos/gateoverflow@cm4mlops/script/app-mlperf-inference/run.sh from tmp-run.sh ! call "postprocess" from /home/cmuser/CM/repos/gateoverflow@cm4mlops/script/app-mlperf-inference/customize.py

cm run script "get mlperf sut description" ! load /home/cmuser/CM/repos/local/cache/9212b01e2bca442e/cm-cached-state.json

SUT: 8c668fa856f8-nvidia_original-gpu-tensorrt-vdefault-default_config, model: resnet50, scenario: Offline, target_qps updated as 108.894 New config stored in /home/cmuser/CM/repos/local/cache/0632aee6fbb54cdb/configs/8c668fa856f8/nvidia_original-implementation/gpu-device/tensorrt-framework/framework-version-vdefault/default_config-config.yaml [2024-05-14 11:29:37,610 log_parser.py:50 INFO] Sucessfully loaded MLPerf log from /home/cmuser/test_results/8c668fa856f8-nvidia_original-gpu-tensorrt-vdefault-default_config/resnet50/offline/performance/run_1/mlperf_log_detail.txt. [2024-05-14 11:29:37,611 log_parser.py:50 INFO] Sucessfully loaded MLPerf log from /home/cmuser/test_results/8c668fa856f8-nvidia_original-gpu-tensorrt-vdefault-default_config/resnet50/offline/performance/run_1/mlperf_log_detail.txt. Running: /usr/bin/python3 /home/cmuser/CM/repos/gateoverflow@cm4mlops/script/dump-pip-freeze/dump.py

8c668fa856f8-nvidia_original-gpu-tensorrt-vdefault-default_config +----------+----------+----------+---------+-----------------+---------------------------------+ | Model | Scenario | Accuracy | QPS | Latency (in ms) | Power Efficiency (in samples/J) | +----------+----------+----------+---------+-----------------+---------------------------------+ | resnet50 | Offline | - | 108.894 | - | | +----------+----------+----------+---------+-----------------+---------------------------------+

The MLPerf inference results are stored at /home/cmuser/test_results

   ! call "postprocess" from /home/cmuser/CM/repos/gateoverflow@cm4mlops/script/run-mlperf-inference-app/customize.py

Path to the MLPerf inference benchmark reference sources: /home/cmuser/CM/repos/local/cache/d8038bec49c846bf/inference Path to the MLPerf inference reference configuration file: /home/cmuser/CM/repos/local/cache/d8038bec49c846bf/inference/mlperf.conf

running time of script "run,common,generate-run-cmds,run-mlperf,run-mlperf-inference,vision,mlcommons,mlperf,inference,reference": 112.66 sec.`

arjunsuresh commented 5 months ago

@WarrenSchultz The first command(4.0) is the expected one to be run inside the docker container. The TensorRT engine generation is failing - so the docker build succeeded right? Last time it had worked fine for me on RTX 4090 (same architecture as 4000 Ada right?) - let me give it a try again now.

WarrenSchultz commented 5 months ago

Yup, run within the docker context. Thanks for checking.

arjunsuresh commented 5 months ago

I'm able to reproduce the error. I think we need to fix the onnx version. Let me give an update.

arjunsuresh commented 5 months ago

This was the issue. Please do cm pull repo inside the container and it should be fine.

WarrenSchultz commented 5 months ago

@arjunsuresh That did the trick, thank you. Only oddity now is that it is coming back as an invalid test, despite having execution-mode set to valid?

cm run script --tags=run-mlperf,inference,_performance-only,_full \ --division=open \ --category=edge \ --device=cuda \ --model=resnet50 \ --precision=float32 \ --implementation=nvidia \ --backend=tensorrt \ --scenario=Offline \ --execution_mode=valid \ --power=no \ --adr.python.version_min=3.8 \ --clean \ --compliance=no \ --quiet \ --time

MLPerf Results Summary
SUT name : LWIS_Server
Scenario : Offline
Mode     : PerformanceOnly
Samples per second: 13535.5
Result is : INVALID
  Min duration satisfied : NO
  Min queries satisfied : Yes
  Early stopping satisfied: Yes
Recommendations:
 * Increase expected QPS so the loadgen pre-generates a larger (coalesced) query.

arjunsuresh commented 5 months ago

You can pass in --offline_target_qps=13540 to the run command. Otherwise the run can go invalid if the automatically determined QPS is too low.

WarrenSchultz commented 5 months ago

That fixed it, thanks!

mlcommons / inference

Problems getting ResNet50 to run in CM #1698