Closed WarrenSchultz closed 1 week ago
Thanks for reporting this. The problem should be fixed now. We typically launch one docker image for nvidia implementation and run all the benchmarks there - so missed this issue for 3d-unet.
Seems to be working now, thanks!
Running the command for ResNet50 works correctly:
cm run script --tags=run-mlperf,inference,_performance-only,_full --division=open --category=edge --device=cuda --model=resnet50 --precision=float32 --implementation=nvidia --backend=tensorrt --scenario=Offline --execution_mode=valid --power=no --adr.python.version_min=3.8 --clean --compliance=no --quiet --time --docker --docker_cache=no
But 3d-unet-99 fails
cm run script --tags=run-mlperf,inference,_performance-only,_full --division=open --category=edge --device=cuda --model=3d-unet-99 --precision=float32 --implementation=nvidia --backend=tensorrt --scenario=Offline --execution_mode=valid --power=no --adr.python.version_min=3.8 --clean --compliance=no --quiet --time --docker --docker_cache=no
Error log: `Loading TensorRT plugin from build/plugins/conv3D3X3X3C1K32Plugin/libconv3D3X3X3C1K32Plugin.so Process Process-1: Traceback (most recent call last): File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, self._kwargs) File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/actionhandler/base.py", line 189, in subprocess_target return self.action_handler.handle() File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/actionhandler/generate_engines.py", line 176, in handle total_engine_build_time += self.build_engine(job) File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/actionhandler/generate_engines.py", line 159, in build_engine builder = get_benchmark(job.config) File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/init.py", line 83, in get_benchmark cls = get_cls(G_BENCHMARK_CLASS_MAP[benchmark]) File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/init.py", line 66, in get_cls return getattr(import_module(module_loc.module_path), module_loc.cls_name) File "/usr/lib/python3.8/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1014, in _gcd_import
File "", line 991, in _find_and_load
File "", line 975, in _find_and_load_unlocked
File "", line 671, in _load_unlocked
File "", line 848, in exec_module
File "", line 219, in _call_with_frames_removed
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/3d-unet/tensorrt/3d-unet.py", line 25, in
import onnx
ModuleNotFoundError: No module named 'onnx'
[2024-06-19 10:30:07,499 generate_engines.py:173 INFO] Building engines for 3d-unet benchmark in Offline scenario...
Loading TensorRT plugin from build/plugins/pixelShuffle3DPlugin/libpixelshuffle3dplugin.so
Loading TensorRT plugin from build/plugins/conv3D1X1X1K4Plugin/libconv3D1X1X1K4Plugin.so
Loading TensorRT plugin from build/plugins/conv3D3X3X3C1K32Plugin/libconv3D3X3X3C1K32Plugin.so
Process Process-2:
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, *self._kwargs)
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/actionhandler/base.py", line 189, in subprocess_target
return self.action_handler.handle()
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/actionhandler/generate_engines.py", line 176, in handle
total_engine_build_time += self.build_engine(job)
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/actionhandler/generate_engines.py", line 159, in build_engine
builder = get_benchmark(job.config)
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/init.py", line 83, in get_benchmark
cls = get_cls(G_BENCHMARK_CLASS_MAP[benchmark])
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/init.py", line 66, in get_cls
return getattr(import_module(module_loc.module_path), module_loc.cls_name)
File "/usr/lib/python3.8/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1014, in _gcd_import
File "", line 991, in _find_and_load
File "", line 975, in _find_and_load_unlocked
File "", line 671, in _load_unlocked
File "", line 848, in exec_module
File "", line 219, in _call_with_frames_removed
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/3d-unet/tensorrt/3d-unet.py", line 25, in
import onnx
ModuleNotFoundError: No module named 'onnx'
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/main.py", line 231, in
main(main_args, DETECTED_SYSTEM)
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/main.py", line 144, in main
dispatch_action(main_args, config_dict, workload_setting)
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/main.py", line 202, in dispatch_action
handler.run()
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/actionhandler/base.py", line 82, in run
self.handle_failure()
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/actionhandler/base.py", line 186, in handle_failure
self.action_handler.handle_failure()
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/actionhandler/generate_engines.py", line 184, in handle_failure
raise RuntimeError("Building engines failed!")
RuntimeError: Building engines failed!
make: [Makefile:37: generate_engines] Error 1
CM error: Portable CM script failed (name = app-mlperf-inference-nvidia, return code = 256)`
However, running 3d-unet-99 within the container built for ResNet50 works correctly.