mlcommons / inference_results_v0.5

This repository contains the results and code for the MLPerf™ Inference v0.5 benchmark.
https://mlcommons.org/en/inference-datacenter-05/
Apache License 2.0
55 stars 43 forks source link

openmpi issue while running pytorch model. #40

Closed sandip761 closed 4 years ago

sandip761 commented 4 years ago

[2020-09-03 14:34:07,428 main.py:291 INFO] Using config files: measurements/Xavier/ssd-large/SingleStream/config.json [2020-09-03 14:34:07,429 init.py:142 INFO] Parsing config file measurements/Xavier/ssd-large/SingleStream/config.json ... [2020-09-03 14:34:07,430 main.py:295 INFO] Processing config "Xavier_ssd-large_SingleStream" [2020-09-03 14:34:07,666 main.py:83 INFO] Building engines for ssd-large benchmark in SingleStream scenario... [2020-09-03 14:34:07,671 main.py:100 INFO] Building GPU engine for Xavier_ssd-large_SingleStream Process Process-1: Traceback (most recent call last): File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, self._kwargs) File "/home/blaize-xavier-nx/inference_results_v0.5/closed/NVIDIA/code/main.py", line 101, in handle_generate_engine b = get_benchmark(benchmark_name, config) File "/home/blaize-xavier-nx/inference_results_v0.5/closed/NVIDIA/code/main.py", line 42, in get_benchmark SSDResNet34 = import_module("code.ssd-large.tensorrt.SSDResNet34").SSDResNet34 File "/usr/lib/python3.6/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 994, in _gcd_import File "", line 971, in _find_and_load File "", line 955, in _find_and_load_unlocked File "", line 665, in _load_unlocked File "", line 678, in exec_module File "", line 219, in _call_with_frames_removed File "/home/blaize-xavier-nx/inference_results_v0.5/closed/NVIDIA/code/ssd-large/tensorrt/SSDResNet34.py", line 37, in load_torch_weights = import_module("code.ssd-large.tensorrt.utils").load_torch_weights File "/usr/lib/python3.6/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 994, in _gcd_import File "", line 971, in _find_and_load File "", line 955, in _find_and_load_unlocked File "", line 665, in _load_unlocked File "", line 678, in exec_module File "", line 219, in _call_with_frames_removed File "/home/blaize-xavier-nx/inference_results_v0.5/closed/NVIDIA/code/ssd-large/tensorrt/utils.py", line 19, in import torch File "/usr/local/lib/python3.6/dist-packages/torch/init.py", line 81, in from torch._C import ImportError: libmpi_cxx.so.20: cannot open shared object file: No such file or directory [2020-09-03 14:34:08,674 main.py:83 INFO] Building engines for ssd-large benchmark in SingleStream scenario... [2020-09-03 14:34:08,678 main.py:100 INFO] Building GPU engine for Xavier_ssd-large_SingleStream Process Process-2: Traceback (most recent call last): File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(self._args, self._kwargs) File "/home/blaize-xavier-nx/inference_results_v0.5/closed/NVIDIA/code/main.py", line 101, in handle_generate_engine b = get_benchmark(benchmark_name, config) File "/home/blaize-xavier-nx/inference_results_v0.5/closed/NVIDIA/code/main.py", line 42, in get_benchmark SSDResNet34 = import_module("code.ssd-large.tensorrt.SSDResNet34").SSDResNet34 File "/usr/lib/python3.6/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 994, in _gcd_import File "", line 971, in _find_and_load File "", line 955, in _find_and_load_unlocked File "", line 665, in _load_unlocked File "", line 678, in exec_module File "", line 219, in _call_with_frames_removed File "/home/blaize-xavier-nx/inference_results_v0.5/closed/NVIDIA/code/ssd-large/tensorrt/SSDResNet34.py", line 37, in load_torch_weights = import_module("code.ssd-large.tensorrt.utils").load_torch_weights File "/usr/lib/python3.6/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 994, in _gcd_import File "", line 971, in _find_and_load File "", line 955, in _find_and_load_unlocked File "", line 665, in _load_unlocked File "", line 678, in exec_module File "", line 219, in _call_with_frames_removed File "/home/blaize-xavier-nx/inference_results_v0.5/closed/NVIDIA/code/ssd-large/tensorrt/utils.py", line 19, in import torch File "/usr/local/lib/python3.6/dist-packages/torch/init.py", line 81, in from torch._C import ImportError: libmpi_cxx.so.20: cannot open shared object file: No such file or directory [2020-09-03 14:34:09,628 main.py:83 INFO] Building engines for ssd-large benchmark in SingleStream scenario... [2020-09-03 14:34:09,632 main.py:100 INFO] Building GPU engine for Xavier_ssd-large_SingleStream Process Process-3: Traceback (most recent call last): File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(self._args, self._kwargs) File "/home/blaize-xavier-nx/inference_results_v0.5/closed/NVIDIA/code/main.py", line 101, in handle_generate_engine b = get_benchmark(benchmark_name, config) File "/home/blaize-xavier-nx/inference_results_v0.5/closed/NVIDIA/code/main.py", line 42, in get_benchmark SSDResNet34 = import_module("code.ssd-large.tensorrt.SSDResNet34").SSDResNet34 File "/usr/lib/python3.6/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 994, in _gcd_import File "", line 971, in _find_and_load File "", line 955, in _find_and_load_unlocked File "", line 665, in _load_unlocked File "", line 678, in exec_module File "", line 219, in _call_with_frames_removed File "/home/blaize-xavier-nx/inference_results_v0.5/closed/NVIDIA/code/ssd-large/tensorrt/SSDResNet34.py", line 37, in load_torch_weights = import_module("code.ssd-large.tensorrt.utils").load_torch_weights File "/usr/lib/python3.6/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 994, in _gcd_import File "", line 971, in _find_and_load File "", line 955, in _find_and_load_unlocked File "", line 665, in _load_unlocked File "", line 678, in exec_module File "", line 219, in _call_with_frames_removed File "/home/blaize-xavier-nx/inference_results_v0.5/closed/NVIDIA/code/ssd-large/tensorrt/utils.py", line 19, in import torch File "/usr/local/lib/python3.6/dist-packages/torch/init.py", line 81, in from torch._C import ImportError: libmpi_cxx.so.20: cannot open shared object file: No such file or directory Traceback (most recent call last): File "code/main.py", line 327, in main() File "code/main.py", line 317, in main launch_handle_generate_engine(benchmark_name, benchmark_conf, need_gpu, need_dla) File "code/main.py", line 80, in launch_handle_generate_engine raise RuntimeError("Building engines failed!") RuntimeError: Building engines failed! Makefile:298: recipe for target 'generate_engines' failed make: [generate_engines] Error 1