Closed WarrenSchultz closed 5 months ago
Hi @WarrenSchultz For reproducing mlperf inference 3.1 and later submissions, we use Nvidia docker and so there is no need to manually add cuDNN and TensorRT. Please do the below commands
cm rm repo mlcommons@ck
cm pull repo gateoverflow@cm4mlops
cm docker script --tags=build,nvidia,inference,server --docker_cache=no --docker_cm_repo=gateoverflow@cm4mlops
After this you should be inside the container in interactive mode and you can do the cm inference
commands.
Thanks, I'll give that a shot. I thought I tried that already, but it's possible I missed something.
You're welcome @WarrenSchultz I believe you're facing the error because you're using the main branch of cm4mlops and not the mlperf-inference branch. The above fork is to that branch as CM for now is not supporting branch while pulling repos.
@arjunsuresh Hm. Still having issues, am I using the wrong command lines?
4.0
cm run script --tags=run-mlperf,inference,_r4.0,_performance-only,_short --division=open --category=edge --device=cuda --model=resnet50 --precision=float32 --implementation=nvidia --backend=tensorrt --scenario=Offline --execution_mode=test --power=no --adr.python.version_min=3.8 --clean --compliance=no --quiet --time
[2024-05-14 11:00:48,546 preprocess_data.py:124 INFO] Preprocessing done.
Finished preprocessing all the datasets!
! call "postprocess" from /home/cmuser/CM/repos/gateoverflow@cm4mlops/script/app-mlperf-inference-nvidia/customize.py
* cm run script "save mlperf inference state"
! call "postprocess" from /home/cmuser/CM/repos/gateoverflow@cm4mlops/script/save-mlperf-inference-implementation-state/customize.py
* cm run script "get generic-python-lib _onnx-graphsurgeon"
! load /home/cmuser/CM/repos/local/cache/6ae2617594f84f4f/cm-cached-state.json
* cm run script "get generic-python-lib _package.onnx"
! load /home/cmuser/CM/repos/local/cache/238f26c6bf6a4429/cm-cached-state.json
! cd /home/cmuser/CM/repos/local/cache/734eb89ff5f44166
! call /home/cmuser/CM/repos/gateoverflow@cm4mlops/script/app-mlperf-inference-nvidia/run.sh from tmp-run.sh
make generate_engines RUN_ARGS=' --benchmarks=resnet50 --scenarios=offline --test_mode=PerformanceOnly --gpu_batch_size=64 --no_audit_verify '
[2024-05-14 11:00:53,958 main.py:230 INFO] Detected system ID: KnownSystem.hpsut
[2024-05-14 11:00:54,751 generate_engines.py:172 INFO] Building engines for resnet50 benchmark in Offline scenario...
[05/14/2024-11:00:54] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 39, GPU 1183 (MiB)
[05/14/2024-11:01:00] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1809, GPU +314, now: CPU 1953, GPU 1497 (MiB)
Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/cmuser/CM/repos/local/cache/a692a0114af64a10/repo/closed/NVIDIA/code/actionhandler/base.py", line 189, in subprocess_target
return self.action_handler.handle()
File "/home/cmuser/CM/repos/local/cache/a692a0114af64a10/repo/closed/NVIDIA/code/actionhandler/generate_engines.py", line 175, in handle
total_engine_build_time += self.build_engine(job)
File "/home/cmuser/CM/repos/local/cache/a692a0114af64a10/repo/closed/NVIDIA/code/actionhandler/generate_engines.py", line 166, in build_engine
builder.build_engines()
File "/home/cmuser/.local/lib/python3.8/site-packages/nvmitten/nvidia/builder.py", line 512, in build_engines
self.mitten_builder.run(self.legacy_scratch, None)
File "/home/cmuser/CM/repos/local/cache/a692a0114af64a10/repo/closed/NVIDIA/code/resnet50/tensorrt/ResNet50.py", line 315, in run
network = self.create_network(self.builder, subnetwork_name=subnet_name)
File "/home/cmuser/CM/repos/local/cache/a692a0114af64a10/repo/closed/NVIDIA/code/resnet50/tensorrt/ResNet50.py", line 177, in create_network
rn50_gs = RN50GraphSurgeon(self.model_path,
File "/home/cmuser/CM/repos/local/cache/a692a0114af64a10/repo/closed/NVIDIA/code/resnet50/tensorrt/rn50_graphsurgeon.py", line 232, in __init__
super().__init__(onnx_path,
File "/home/cmuser/.local/lib/python3.8/site-packages/nvmitten/nvidia/builder.py", line 114, in __init__
self.graph = self.import_onnx(onnx_path)
File "/home/cmuser/.local/lib/python3.8/site-packages/nvmitten/nvidia/builder.py", line 145, in import_onnx
return gs.import_onnx(model)
File "/home/cmuser/.local/lib/python3.8/site-packages/onnx_graphsurgeon/importers/onnx_importer.py", line 508, in import_onnx
return OnnxImporter.import_graph(
File "/home/cmuser/.local/lib/python3.8/site-packages/onnx_graphsurgeon/importers/onnx_importer.py", line 430, in import_graph
get_tensor(initializer)
File "/home/cmuser/.local/lib/python3.8/site-packages/onnx_graphsurgeon/importers/onnx_importer.py", line 424, in get_tensor
subgraph_tensor_map[name] = OnnxImporter.import_tensor(onnx_tensor)
File "/home/cmuser/.local/lib/python3.8/site-packages/onnx_graphsurgeon/importers/onnx_importer.py", line 213, in import_tensor
return Constant(name=onnx_tensor.name, values=LazyValues(onnx_tensor), data_location=data_location)
File "/home/cmuser/.local/lib/python3.8/site-packages/onnx_graphsurgeon/ir/tensor.py", line 211, in __init__
self.dtype = get_onnx_tensor_dtype(self.tensor)
File "/home/cmuser/.local/lib/python3.8/site-packages/onnx_graphsurgeon/importers/onnx_importer.py", line 129, in get_onnx_tensor_dtype
dtype = get_numpy_type(onnx_dtype)
File "/home/cmuser/.local/lib/python3.8/site-packages/onnx_graphsurgeon/importers/onnx_importer.py", line 95, in get_numpy_type
onnx.TensorProto.FLOAT8E4M3FN,
AttributeError: FLOAT8E4M3FN
[2024-05-14 11:01:01,991 generate_engines.py:172 INFO] Building engines for resnet50 benchmark in Offline scenario...
[05/14/2024-11:01:01] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 39, GPU 1183 (MiB)
[05/14/2024-11:01:07] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1809, GPU +314, now: CPU 1953, GPU 1497 (MiB)
Process Process-2:
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/cmuser/CM/repos/local/cache/a692a0114af64a10/repo/closed/NVIDIA/code/actionhandler/base.py", line 189, in subprocess_target
return self.action_handler.handle()
File "/home/cmuser/CM/repos/local/cache/a692a0114af64a10/repo/closed/NVIDIA/code/actionhandler/generate_engines.py", line 175, in handle
total_engine_build_time += self.build_engine(job)
File "/home/cmuser/CM/repos/local/cache/a692a0114af64a10/repo/closed/NVIDIA/code/actionhandler/generate_engines.py", line 166, in build_engine
builder.build_engines()
File "/home/cmuser/.local/lib/python3.8/site-packages/nvmitten/nvidia/builder.py", line 512, in build_engines
self.mitten_builder.run(self.legacy_scratch, None)
File "/home/cmuser/CM/repos/local/cache/a692a0114af64a10/repo/closed/NVIDIA/code/resnet50/tensorrt/ResNet50.py", line 315, in run
network = self.create_network(self.builder, subnetwork_name=subnet_name)
File "/home/cmuser/CM/repos/local/cache/a692a0114af64a10/repo/closed/NVIDIA/code/resnet50/tensorrt/ResNet50.py", line 177, in create_network
rn50_gs = RN50GraphSurgeon(self.model_path,
File "/home/cmuser/CM/repos/local/cache/a692a0114af64a10/repo/closed/NVIDIA/code/resnet50/tensorrt/rn50_graphsurgeon.py", line 232, in __init__
super().__init__(onnx_path,
File "/home/cmuser/.local/lib/python3.8/site-packages/nvmitten/nvidia/builder.py", line 114, in __init__
self.graph = self.import_onnx(onnx_path)
File "/home/cmuser/.local/lib/python3.8/site-packages/nvmitten/nvidia/builder.py", line 145, in import_onnx
return gs.import_onnx(model)
File "/home/cmuser/.local/lib/python3.8/site-packages/onnx_graphsurgeon/importers/onnx_importer.py", line 508, in import_onnx
return OnnxImporter.import_graph(
File "/home/cmuser/.local/lib/python3.8/site-packages/onnx_graphsurgeon/importers/onnx_importer.py", line 430, in import_graph
get_tensor(initializer)
File "/home/cmuser/.local/lib/python3.8/site-packages/onnx_graphsurgeon/importers/onnx_importer.py", line 424, in get_tensor
subgraph_tensor_map[name] = OnnxImporter.import_tensor(onnx_tensor)
File "/home/cmuser/.local/lib/python3.8/site-packages/onnx_graphsurgeon/importers/onnx_importer.py", line 213, in import_tensor
return Constant(name=onnx_tensor.name, values=LazyValues(onnx_tensor), data_location=data_location)
File "/home/cmuser/.local/lib/python3.8/site-packages/onnx_graphsurgeon/ir/tensor.py", line 211, in __init__
self.dtype = get_onnx_tensor_dtype(self.tensor)
File "/home/cmuser/.local/lib/python3.8/site-packages/onnx_graphsurgeon/importers/onnx_importer.py", line 129, in get_onnx_tensor_dtype
dtype = get_numpy_type(onnx_dtype)
File "/home/cmuser/.local/lib/python3.8/site-packages/onnx_graphsurgeon/importers/onnx_importer.py", line 95, in get_numpy_type
onnx.TensorProto.FLOAT8E4M3FN,
AttributeError: FLOAT8E4M3FN
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/cmuser/CM/repos/local/cache/a692a0114af64a10/repo/closed/NVIDIA/code/main.py", line 232, in <module>
main(main_args, DETECTED_SYSTEM)
File "/home/cmuser/CM/repos/local/cache/a692a0114af64a10/repo/closed/NVIDIA/code/main.py", line 145, in main
dispatch_action(main_args, config_dict, workload_setting)
File "/home/cmuser/CM/repos/local/cache/a692a0114af64a10/repo/closed/NVIDIA/code/main.py", line 203, in dispatch_action
handler.run()
File "/home/cmuser/CM/repos/local/cache/a692a0114af64a10/repo/closed/NVIDIA/code/actionhandler/base.py", line 82, in run
self.handle_failure()
File "/home/cmuser/CM/repos/local/cache/a692a0114af64a10/repo/closed/NVIDIA/code/actionhandler/base.py", line 186, in handle_failure
self.action_handler.handle_failure()
File "/home/cmuser/CM/repos/local/cache/a692a0114af64a10/repo/closed/NVIDIA/code/actionhandler/generate_engines.py", line 183, in handle_failure
raise RuntimeError("Building engines failed!")
RuntimeError: Building engines failed!
make: *** [Makefile:37: generate_engines] Error 1
CM error: Portable CM script failed (name = app-mlperf-inference-nvidia, return code = 256)
3.1
cm run script --tags=run-mlperf,inference,_r3.1,_performance-only,_short --division=open --category=edge --device=cuda --model=resnet50 --precision=float32 --implementation=nvidia --backend=tensorrt --scenario=Offline --execution_mode=test --power=no --adr.python.version_min=3.8 --clean --compliance=no --quiet --time
`[ 34%] Linking CXX static library libtritontableprinter.a cd /home/cmuser/CM/repos/local/cache/4a2b1abf4b974cb3/repo/closed/NVIDIA/build/triton-inference-server/out/tritonserver/build/triton-server/_deps/repo-common-build && /home/cmuser/CM/repos/local/cache/bdabc1e4cac745e0/bin/cmake -P CMakeFiles/triton-common-table-printer.dir/cmake_clean_target.cmake cd /home/cmuser/CM/repos/local/cache/4a2b1abf4b974cb3/repo/closed/NVIDIA/build/triton-inference-server/out/tritonserver/build/triton-server/_deps/repo-common-build && /home/cmuser/CM/repos/local/cache/bdabc1e4cac745e0/bin/cmake -E cmake_link_script CMakeFiles/triton-common-table-printer.dir/link.txt --verbose=0 make[7]: Leaving directory '/home/cmuser/CM/repos/local/cache/4a2b1abf4b974cb3/repo/closed/NVIDIA/build/triton-inference-server/out/tritonserver/build/triton-server' [ 34%] Built target triton-common-table-printer make[6]: Leaving directory '/home/cmuser/CM/repos/local/cache/4a2b1abf4b974cb3/repo/closed/NVIDIA/build/triton-inference-server/out/tritonserver/build/triton-server' make[5]: [Makefile:136: all] Error 2 make[5]: Leaving directory '/home/cmuser/CM/repos/local/cache/4a2b1abf4b974cb3/repo/closed/NVIDIA/build/triton-inference-server/out/tritonserver/build/triton-server' make[4]: [CMakeFiles/triton-server.dir/build.make:86: triton-server/src/triton-server-stamp/triton-server-build] Error 2 make[4]: Leaving directory '/home/cmuser/CM/repos/local/cache/4a2b1abf4b974cb3/repo/closed/NVIDIA/build/triton-inference-server/out/tritonserver/build' make[3]: [CMakeFiles/Makefile2:137: CMakeFiles/triton-server.dir/all] Error 2 make[3]: Leaving directory '/home/cmuser/CM/repos/local/cache/4a2b1abf4b974cb3/repo/closed/NVIDIA/build/triton-inference-server/out/tritonserver/build' make[2]: [Makefile:136: all] Error 2 make[2]: Leaving directory '/home/cmuser/CM/repos/local/cache/4a2b1abf4b974cb3/repo/closed/NVIDIA/build/triton-inference-server/out/tritonserver/build' error: build failed make[1]: [Makefile.build:224: build_triton] Error 1 make[1]: Leaving directory '/home/cmuser/CM/repos/local/cache/4a2b1abf4b974cb3/repo/closed/NVIDIA' make: [/home/cmuser/CM/repos/local/cache/4a2b1abf4b974cb3/repo/closed/NVIDIA/Makefile.build:170: build] Error 2
CM error: Portable CM script failed (name = build-mlperf-inference-server-nvidia, return code = 256)`
Edit: This is running on Docker-CE on WSL2, using the Ubuntu 22.04 distro.
One more data point. Running tensorflow in reference mode returns the following:
It says it's missing some GPU libraries, or is that an irrelevant message? But the numbers definitely look more like CPU than GPU results for a 4000 Ada.
`CM script::benchmark-program/run.sh
Run Directory: /home/cmuser/CM/repos/local/cache/d8038bec49c846bf/inference/vision/classification_and_detection
CMD: ./run_local.sh tf resnet50 gpu --scenario Offline --mlperf_conf '/home/cmuser/CM/repos/local/cache/d8038bec49c846bf/inference/mlperf.conf' --threads 2 --user_conf '/home/cmuser/CM/repos/gateoverflow@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/a22d72678c29414d9e035e295a6731d0.conf' --use_preprocessed_dataset --cache_dir /home/cmuser/CM/repos/local/cache/47c7051a9af44613 --dataset-list /home/cmuser/CM/repos/local/cache/ef9dd0d7a8e54e80/data/val.txt 2>&1 | tee /home/cmuser/test_results/8c668fa856f8-nvidia_original-gpu-tensorrt-vdefault-default_config/resnet50/offline/performance/run_1/console.out
! cd /home/cmuser
! call /home/cmuser/CM/repos/gateoverflow@cm4mlops/script/benchmark-program/run-ubuntu.sh from tmp-run.sh
python3 python/main.py --profile resnet50-tf --mlperf_conf ../../mlperf.conf --model "/home/cmuser/CM/repos/local/cache/28478cb4a4994866/resnet50_v1.pb" --dataset-path /home/cmuser/CM/repos/local/cache/47c7051a9af44613 --output "/home/cmuser/test_results/8c668fa856f8-nvidia_original-gpu-tensorrt-vdefault-default_config/resnet50/offline/performance/run_1" --scenario Offline --mlperf_conf /home/cmuser/CM/repos/local/cache/d8038bec49c846bf/inference/mlperf.conf --threads 2 --user_conf /home/cmuser/CM/repos/gateoverflow@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/a22d72678c29414d9e035e295a6731d0.conf --use_preprocessed_dataset --cache_dir /home/cmuser/CM/repos/local/cache/47c7051a9af44613 --dataset-list /home/cmuser/CM/repos/local/cache/ef9dd0d7a8e54e80/data/val.txt
INFO:main:Namespace(accuracy=False, audit_conf='audit.config', backend='tensorflow', cache=0, cache_dir='/home/cmuser/CM/repos/local/cache/47c7051a9af44613', count=None, data_format=None, dataset='imagenet', dataset_list='/home/cmuser/CM/repos/local/cache/ef9dd0d7a8e54e80/data/val.txt', dataset_path='/home/cmuser/CM/repos/local/cache/47c7051a9af44613', debug=False, find_peak_performance=False, inputs=['input_tensor:0'], max_batchsize=32, max_latency=None, mlperf_conf='/home/cmuser/CM/repos/local/cache/d8038bec49c846bf/inference/mlperf.conf', model='/home/cmuser/CM/repos/local/cache/28478cb4a4994866/resnet50_v1.pb', model_name='resnet50', output='/home/cmuser/test_results/8c668fa856f8-nvidia_original-gpu-tensorrt-vdefault-default_config/resnet50/offline/performance/run_1', outputs=['ArgMax:0'], performance_sample_count=None, preprocessed_dir=None, profile='resnet50-tf', qps=None, samples_per_query=8, scenario='Offline', threads=2, time=None, use_preprocessed_dataset=True, user_conf='/home/cmuser/CM/repos/gateoverflow@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/a22d72678c29414d9e035e295a6731d0.conf')
2024-05-14 11:29:07.643748: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0
.
2024-05-14 11:29:07.645081: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-05-14 11:29:07.668378: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-05-14 11:29:07.668722: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-05-14 11:29:08.022557: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
INFO:imagenet:Loading 50000 preprocessed images using 2 threads
INFO:imagenet:reduced image list, 49500 images not found
INFO:imagenet:loaded 500 images, cache=0, already_preprocessed=True, took=1.4sec
WARNING:tensorflow:From /home/cmuser/CM/repos/local/cache/d8038bec49c846bf/inference/vision/classification_and_detection/python/backend_tf.py:47: FastGFile.init (from tensorflow.python.platform.gfile) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.gfile.GFile.
WARNING:tensorflow:From /home/cmuser/.local/lib/python3.8/site-packages/tensorflow/python/tools/strip_unused_lib.py:84: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This API was designed for TensorFlow v1. See https://www.tensorflow.org/guide/migrate for instructions on how to migrate your code to TensorFlow v2.
WARNING:tensorflow:From /home/cmuser/.local/lib/python3.8/site-packages/tensorflow/python/tools/optimize_for_inference_lib.py:112: remove_training_nodes (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This API was designed for TensorFlow v1. See https://www.tensorflow.org/guide/migrate for instructions on how to migrate your code to TensorFlow v2.
2024-05-14 11:29:35.874583: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-05-14 11:29:35.965735: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1960] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2024-05-14 11:29:36.106365: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:375] MLIR V1 optimization pass is not enabled
INFO:main:starting TestScenario.Offline
TestScenario.Offline qps=8.42, mean=0.0914, time=0.119, queries=1, tiles=50.0:0.0914,80.0:0.0914,90.0:0.0914,95.0:0.0914,99.0:0.0914,99.9:0.0914
! call "postprocess" from /home/cmuser/CM/repos/gateoverflow@cm4mlops/script/benchmark-program/customize.py
SUT: 8c668fa856f8-nvidia_original-gpu-tensorrt-vdefault-default_config, model: resnet50, scenario: Offline, target_qps updated as 108.894 New config stored in /home/cmuser/CM/repos/local/cache/0632aee6fbb54cdb/configs/8c668fa856f8/nvidia_original-implementation/gpu-device/tensorrt-framework/framework-version-vdefault/default_config-config.yaml [2024-05-14 11:29:37,610 log_parser.py:50 INFO] Sucessfully loaded MLPerf log from /home/cmuser/test_results/8c668fa856f8-nvidia_original-gpu-tensorrt-vdefault-default_config/resnet50/offline/performance/run_1/mlperf_log_detail.txt. [2024-05-14 11:29:37,611 log_parser.py:50 INFO] Sucessfully loaded MLPerf log from /home/cmuser/test_results/8c668fa856f8-nvidia_original-gpu-tensorrt-vdefault-default_config/resnet50/offline/performance/run_1/mlperf_log_detail.txt. Running: /usr/bin/python3 /home/cmuser/CM/repos/gateoverflow@cm4mlops/script/dump-pip-freeze/dump.py
8c668fa856f8-nvidia_original-gpu-tensorrt-vdefault-default_config +----------+----------+----------+---------+-----------------+---------------------------------+ | Model | Scenario | Accuracy | QPS | Latency (in ms) | Power Efficiency (in samples/J) | +----------+----------+----------+---------+-----------------+---------------------------------+ | resnet50 | Offline | - | 108.894 | - | | +----------+----------+----------+---------+-----------------+---------------------------------+
The MLPerf inference results are stored at /home/cmuser/test_results
! call "postprocess" from /home/cmuser/CM/repos/gateoverflow@cm4mlops/script/run-mlperf-inference-app/customize.py
Path to the MLPerf inference benchmark reference sources: /home/cmuser/CM/repos/local/cache/d8038bec49c846bf/inference Path to the MLPerf inference reference configuration file: /home/cmuser/CM/repos/local/cache/d8038bec49c846bf/inference/mlperf.conf
@WarrenSchultz The first command(4.0) is the expected one to be run inside the docker container. The TensorRT engine generation is failing - so the docker build succeeded right? Last time it had worked fine for me on RTX 4090 (same architecture as 4000 Ada right?) - let me give it a try again now.
Yup, run within the docker context. Thanks for checking.
I'm able to reproduce the error. I think we need to fix the onnx version. Let me give an update.
This was the issue. Please do cm pull repo
inside the container and it should be fine.
@arjunsuresh That did the trick, thank you. Only oddity now is that it is coming back as an invalid test, despite having execution-mode set to valid?
cm run script --tags=run-mlperf,inference,_performance-only,_full \ --division=open \ --category=edge \ --device=cuda \ --model=resnet50 \ --precision=float32 \ --implementation=nvidia \ --backend=tensorrt \ --scenario=Offline \ --execution_mode=valid \ --power=no \ --adr.python.version_min=3.8 \ --clean \ --compliance=no \ --quiet \ --time
MLPerf Results Summary
SUT name : LWIS_Server
Scenario : Offline
Mode : PerformanceOnly
Samples per second: 13535.5
Result is : INVALID
Min duration satisfied : NO
Min queries satisfied : Yes
Early stopping satisfied: Yes
Recommendations:
* Increase expected QPS so the loadgen pre-generates a larger (coalesced) query.
You can pass in --offline_target_qps=13540
to the run command. Otherwise the run can go invalid if the automatically determined QPS is too low.
That fixed it, thanks!
I have tried a variety of combinations of CUDA, CuDNN, and TensorRT versions, using a variety of techniques when trying to run ResNet 50 v1.5 using CM on Ubuntu/WSL2/Windows 11/CUDA.
Most recently, I've been using this for reference, but still having problems, even using containers.
The failure condition I most commonly get follows:
Traceback (most recent call last): File "/home/cmuser/.local/bin/cm", line 8, in <module> sys.exit(run()) File "/home/cmuser/.local/lib/python3.10/site-packages/cmind/cli.py", line 37, in run r = cm.access(argv, out='con') File "/home/cmuser/.local/lib/python3.10/site-packages/cmind/core.py", line 602, in access r = action_addr(i) File "/home/cmuser/CM/repos/mlcommons@cm4mlops/automation/script/module.py", line 211, in run r = self._run(i) File "/home/cmuser/CM/repos/mlcommons@cm4mlops/automation/script/module.py", line 1485, in _run r = customize_code.preprocess(ii) File "/home/cmuser/CM/repos/mlcommons@cm4mlops/script/run-mlperf-inference-app/customize.py", line 215, in preprocess r = cm.access(ii) File "/home/cmuser/.local/lib/python3.10/site-packages/cmind/core.py", line 758, in access return cm.access(i) File "/home/cmuser/.local/lib/python3.10/site-packages/cmind/core.py", line 602, in access r = action_addr(i) File "/home/cmuser/CM/repos/mlcommons@cm4mlops/automation/script/module.py", line 4067, in docker return utils.call_internal_module(self, __file__, 'module_misc', 'docker', i) File "/home/cmuser/.local/lib/python3.10/site-packages/cmind/utils.py", line 1631, in call_internal_module return getattr(tmp_module, module_func)(i) File "/home/cmuser/CM/repos/mlcommons@cm4mlops/automation/script/module_misc.py", line 1772, in docker r = script_automation._run_deps(deps, [], env, {}, {}, {}, {}, '', {}, '', False, '', verbose, show_time, ' ', run_state) File "/home/cmuser/CM/repos/mlcommons@cm4mlops/automation/script/module.py", line 3038, in _run_deps r = self.cmind.access(ii) File "/home/cmuser/.local/lib/python3.10/site-packages/cmind/core.py", line 602, in access r = action_addr(i) File "/home/cmuser/CM/repos/mlcommons@cm4mlops/automation/script/module.py", line 211, in run r = self._run(i) File "/home/cmuser/CM/repos/mlcommons@cm4mlops/automation/script/module.py", line 727, in _run remembered_selections.append({'type': 'script', AttributeError: 'dict' object has no attribute 'append'