mlcommons / inference_results_v1.1

This repository contains the results and code for the MLPerf™ Inference v1.1 benchmark.
https://mlcommons.org/en/inference-datacenter-11/
Apache License 2.0
11 stars 23 forks source link

Can not run "make build" on Xavier AGX #13

Open JoachimMoe opened 1 year ago

JoachimMoe commented 1 year ago

After following the readme, the next step was to run the command make build. This, however fails on the Xavier AGX when it tries to build for server, which is somewhat expected. The following error is thrown:

[ 81%] Building CXX object src/core/CMakeFiles/server-library.dir/tritonserver.cc.o
cd /media/nvmedrive/inference_results_v1.1/closed/NVIDIA/build/triton-inference-server/out/tritonserver/build/server/src/core && /usr/bin/c++ -DTRITON_ENABLE_CUDA_GRAPH=1 -DTRITON_ENABLE_GPU=1 -DTRITON_ENABLE_LOGGING=1 -DTRITON_ENABLE_TENSORRT=1 -DTRITON_MIN_COMPUTE_CAPABILITY=6.0 -DTRITON_VERSION=\"2.13.0dev\" -I/media/nvmedrive/inference_results_v1.1/closed/NVIDIA/build/triton-inference-server/build/server/../.. -I/media/nvmedrive/inference_results_v1.1/closed/NVIDIA/build/triton-inference-server/out/tritonserver/build/server -I/media/nvmedrive/inference_results_v1.1/closed/NVIDIA/build/triton-inference-server/out/tritonserver/build/third-party/protobuf/include -I/usr/local/cuda/include -I/media/nvmedrive/inference_results_v1.1/closed/NVIDIA/build/triton-inference-server/out/tritonserver/build/third-party/cnmem/include -I/media/nvmedrive/inference_results_v1.1/closed/NVIDIA/build/triton-inference-server/out/tritonserver/build/server/_deps/repo-common-build/protobuf -I/media/nvmedrive/inference_results_v1.1/closed/NVIDIA/build/triton-inference-server/out/tritonserver/build/server/_deps/repo-core-src/include -I/media/nvmedrive/inference_results_v1.1/closed/NVIDIA/build/triton-inference-server/out/tritonserver/build/server/_deps/repo-common-src/include -Wall -Wextra -Wno-unused-parameter -Werror -Wno-deprecated-declarations -O3 -fPIC -std=gnu++11 -o CMakeFiles/server-library.dir/tritonserver.cc.o -c /media/nvmedrive/inference_results_v1.1/closed/NVIDIA/build/triton-inference-server/src/core/tritonserver.cc
[ 82%] Building CXX object src/core/CMakeFiles/server-library.dir/cuda_memory_manager.cc.o
cd /media/nvmedrive/inference_results_v1.1/closed/NVIDIA/build/triton-inference-server/out/tritonserver/build/server/src/core && /usr/bin/c++ -DTRITON_ENABLE_CUDA_GRAPH=1 -DTRITON_ENABLE_GPU=1 -DTRITON_ENABLE_LOGGING=1 -DTRITON_ENABLE_TENSORRT=1 -DTRITON_MIN_COMPUTE_CAPABILITY=6.0 -DTRITON_VERSION=\"2.13.0dev\" -I/media/nvmedrive/inference_results_v1.1/closed/NVIDIA/build/triton-inference-server/build/server/../.. -I/media/nvmedrive/inference_results_v1.1/closed/NVIDIA/build/triton-inference-server/out/tritonserver/build/server -I/media/nvmedrive/inference_results_v1.1/closed/NVIDIA/build/triton-inference-server/out/tritonserver/build/third-party/protobuf/include -I/usr/local/cuda/include -I/media/nvmedrive/inference_results_v1.1/closed/NVIDIA/build/triton-inference-server/out/tritonserver/build/third-party/cnmem/include -I/media/nvmedrive/inference_results_v1.1/closed/NVIDIA/build/triton-inference-server/out/tritonserver/build/server/_deps/repo-common-build/protobuf -I/media/nvmedrive/inference_results_v1.1/closed/NVIDIA/build/triton-inference-server/out/tritonserver/build/server/_deps/repo-core-src/include -I/media/nvmedrive/inference_results_v1.1/closed/NVIDIA/build/triton-inference-server/out/tritonserver/build/server/_deps/repo-common-src/include -Wall -Wextra -Wno-unused-parameter -Werror -Wno-deprecated-declarations -O3 -fPIC -std=gnu++11 -o CMakeFiles/server-library.dir/cuda_memory_manager.cc.o -c /media/nvmedrive/inference_results_v1.1/closed/NVIDIA/build/triton-inference-server/src/core/cuda_memory_manager.cc
make[8]: Leaving directory '/media/nvmedrive/inference_results_v1.1/closed/NVIDIA/build/triton-inference-server/out/tritonserver/build/server'
make[7]: *** [CMakeFiles/Makefile2:851: src/backends/tensorrt/CMakeFiles/tensorrt-backend-library.dir/all] Error 2
make[7]: *** Waiting for unfinished jobs....
make[8]: Leaving directory '/media/nvmedrive/inference_results_v1.1/closed/NVIDIA/build/triton-inference-server/out/tritonserver/build/server'
[ 82%] Built target server-library
make[7]: Leaving directory '/media/nvmedrive/inference_results_v1.1/closed/NVIDIA/build/triton-inference-server/out/tritonserver/build/server'
make[6]: *** [Makefile:149: all] Error 2
make[6]: Leaving directory '/media/nvmedrive/inference_results_v1.1/closed/NVIDIA/build/triton-inference-server/out/tritonserver/build/server'
make[5]: *** [CMakeFiles/server.dir/build.make:133: server/src/server-stamp/server-build] Error 2
make[5]: Leaving directory '/media/nvmedrive/inference_results_v1.1/closed/NVIDIA/build/triton-inference-server/out/tritonserver/build'
make[4]: *** [CMakeFiles/Makefile2:150: CMakeFiles/server.dir/all] Error 2
make[4]: Leaving directory '/media/nvmedrive/inference_results_v1.1/closed/NVIDIA/build/triton-inference-server/out/tritonserver/build'
make[3]: *** [CMakeFiles/Makefile2:157: CMakeFiles/server.dir/rule] Error 2
make[3]: Leaving directory '/media/nvmedrive/inference_results_v1.1/closed/NVIDIA/build/triton-inference-server/out/tritonserver/build'
make[2]: *** [Makefile:137: server] Error 2
make[2]: Leaving directory '/media/nvmedrive/inference_results_v1.1/closed/NVIDIA/build/triton-inference-server/out/tritonserver/build'
error: make server failed
make[1]: *** [Makefile:485: build_triton] Error 1
make[1]: Leaving directory '/media/nvmedrive/inference_results_v1.1/closed/NVIDIA'
make: *** [Makefile:447: build] Error 2

I have ran both the install_xavier_dependencies.sh and did a pip3 install -r requirements_xavier.txt

Now, when I try to run anything else, such as: make run RUN_ARGS="--benchmarks=resnet50 --scenarios=offline,singlestream"

It throws the following:

make[1]: Entering directory '/media/nvmedrive/inference_results_v1.1/closed/NVIDIA'
[2023-09-06 19:29:33,883 main.py:760 INFO] Detected System ID: AGX_Xavier
[2023-09-06 19:29:35,437 main.py:108 INFO] Building engines for resnet50 benchmark in Offline scenario...
[2023-09-06 19:29:35,441 main.py:117 INFO] Building DLA engine for AGX_Xavier_resnet50_Offline
/home/jetson/.local/lib/python3.8/site-packages/onnx/mapping.py:27: FutureWarning: In the future `np.object` will be defined as the corresponding NumPy scalar.
  int(TensorProto.STRING): np.dtype(np.object)
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/media/nvmedrive/inference_results_v1.1/closed/NVIDIA/code/main.py", line 118, in handle_generate_engine
    b = get_benchmark(config)
  File "/media/nvmedrive/inference_results_v1.1/closed/NVIDIA/code/__init__.py", line 80, in get_benchmark
    cls = get_cls(G_BENCHMARK_CLASS_MAP[benchmark])
  File "/media/nvmedrive/inference_results_v1.1/closed/NVIDIA/code/__init__.py", line 63, in get_cls
    return getattr(import_module(module_loc.module_path), module_loc.cls_name)
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 848, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/media/nvmedrive/inference_results_v1.1/closed/NVIDIA/code/resnet50/tensorrt/ResNet50.py", line 20, in <module>
    import onnx
  File "/home/jetson/.local/lib/python3.8/site-packages/onnx/__init__.py", line 20, in <module>
    import onnx.helper  # noqa
  File "/home/jetson/.local/lib/python3.8/site-packages/onnx/helper.py", line 17, in <module>
    from onnx import mapping
  File "/home/jetson/.local/lib/python3.8/site-packages/onnx/mapping.py", line 27, in <module>
    int(TensorProto.STRING): np.dtype(np.object)
  File "/home/jetson/.local/lib/python3.8/site-packages/numpy/__init__.py", line 305, in __getattr__
    raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'object'.
`np.object` was a deprecated alias for the builtin `object`. To avoid this error in existing code, use `object` by itself. Doing this will not modify any behavior and is safe. 
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
Traceback (most recent call last):
  File "code/main.py", line 763, in <module>
    main(main_args, system)
  File "code/main.py", line 736, in main
    dispatch_action(main_args, config_dict, workload_id, equiv_engine_setting=equiv_engine_setting)
  File "code/main.py", line 556, in dispatch_action
    launch_handle_generate_engine(*_gen_args, **_gen_kwargs)
  File "code/main.py", line 92, in launch_handle_generate_engine
    raise RuntimeError("Building engines failed!")
RuntimeError: Building engines failed!
make[1]: *** [Makefile:619: generate_engines] Error 1
make[1]: Leaving directory '/media/nvmedrive/inference_results_v1.1/closed/NVIDIA'
make: *** [Makefile:613: run] Error 2