mlcommons / inference_results_v2.0

This repository contains the results and code for the MLPerf™ Inference v2.0 benchmark.
https://mlcommons.org/en/inference-datacenter-20/
Apache License 2.0
9 stars 12 forks source link

ModuleNotFoundError: No module named 'nvidia' #19

Closed khushbuKinara closed 1 year ago

khushbuKinara commented 1 year ago

I am trying to build the inference_result_v2.0 for resnet50 using "make build" command on Xavier NX with CUDA 10.2.

Getting below error:

`Building harness...
Warning: setting -Wno-deprecated-declarations to avoid header warnings
Building default harness...
Building Triton harness...
Skipping TRITON MIG harness for aarch64

Building 3D-UNet-KiTS19 harness...
Building RNN-T harness...
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'nvidia'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'nvidia'
CMake Error at CMakeLists.txt:469 (message):
  Error acquiring DALI.  Verify, that you have DALI whl installed
Call Stack (most recent call first):
  CMakeLists.txt:508 (get_dali_paths)

-- Configuring incomplete, errors occurred!
See also "/home/jetson/projects/inference_results_v2.0/closed/NVIDIA/build/harness/CMakeFiles/CMakeOutput.log".
See also "/home/jetson/projects/inference_results_v2.0/closed/NVIDIA/build/harness/CMakeFiles/CMakeError.log".
Makefile:569: recipe for target 'build_harness' failed
make[1]: *** [build_harness] Error 1
make[1]: Leaving directory '/home/jetson/projects/inference_results_v2.0/closed/NVIDIA'
Makefile:471: recipe for target 'build' failed
make: *** [build] Error 2`

Further steps followed: 1) I tried to build Nvidia-dali, using "pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ --upgrade nvidia-dali-cuda102" Again got error:

`Collecting nvidia-dali-cuda102
  Downloading nvidia-dali-cuda102-0.0.1.dev5.tar.gz (8.0 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [19 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-xj5jcd7e/nvidia-dali-cuda102_23c7b4e8e03a4b408c1c96f4690654bc/setup.py", line 150, in <module>
          raise RuntimeError(open("ERROR.txt", "r").read())
      RuntimeError:
      ###########################################################################################
      The package you are trying to install is only a placeholder project on PyPI.org repository.
      This package is hosted on NVIDIA Python Package Index.

      This package can be installed as:
  $ pip install nvidia-pyindex
  $ pip install nvidia-dali-cuda102
  ```

  Please refer to NVIDIA DALI installation guide for instructions:
  https://docs.nvidia.com/deeplearning/dali/user-guide/docs/installation.html
  ###########################################################################################
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed`


2) Try to build DALI from source:
git clone --recursive https://github.com/NVIDIA/DALI
cd DALI
mkdir build
cd build
cmake -D CMAKE_BUILD_TYPE=Release ..

Make is again giving error:

`-- Build configuration: Release grep: NVJPEG_INCLUDE_DIR-NOTFOUND/nvjpeg.h: No such file or directory CMake Error at /home/jetson/.local/lib/python3.6/site-packages/cmake/data/share/cmake-3.18/Modules/FindPackageHandleStandardArgs.cmake:165 (message): Could NOT find NVJPEG: Found unsuitable version "", but required is at least "9.0" (found NVJPEG_INCLUDE_DIR-NOTFOUND) Call Stack (most recent call first): /home/jetson/.local/lib/python3.6/site-packages/cmake/data/share/cmake-3.18/Modules/FindPackageHandleStandardArgs.cmake:456 (_FPHSA_FAILURE_MESSAGE) cmake/modules/FindNVJPEG.cmake:28 (find_package_handle_standard_args) cmake/Dependencies.cmake:24 (find_package) CMakeLists.txt:213 (include)

-- Configuring incomplete, errors occurred! See also "/home/jetson/projects/inference_results_v2.0/closed/NVIDIA/DALI/build/CMakeFiles/CMakeOutput.log".`



Can somebody please help me resolve the above error.
Thanks in advance.
nvyihengz commented 1 year ago

Hi, could you try run this script to install the necessary dependencies, including DALI for Xavier? https://github.com/mlcommons/inference_results_v2.0/blob/master/closed/Azure/scripts/install_xavier_dependencies.sh

khushbuKinara commented 1 year ago

Hi @nvyihengz I tried above script to install dependencies, and getting below errors:

Traceback (most recent call last): File "/usr/local/DALI/dali/core/../../tools/stub_generator/stub_codegen.py", line 24, in <module> import clang.cindex ModuleNotFoundError: No module named 'clang' dali/core/CMakeFiles/dynlink_cuda.dir/build.make:82: recipe for target 'dali/core/dynlink_cuda_gen.cc' failed make[2]: *** [dali/core/dynlink_cuda_gen.cc] Error 1 CMakeFiles/Makefile2:2203: recipe for target 'dali/core/CMakeFiles/dynlink_cuda.dir/all' failed make[1]: *** [dali/core/CMakeFiles/dynlink_cuda.dir/all] Error 2 make[1]: *** Waiting for unfinished jobs.... [ 3%] Linking C static library libffts.a [ 3%] Built target ffts_static Makefile:170: recipe for target 'all' failed make: *** [all] Error 2

Switched to a new branch 'release/7.2' Branch 'release/7.2' set up to track remote branch 'release/7.2' from 'origin'. rm -rf dist/ build/ onnx_graphsurgeon.egg-info/ python3 setup.py bdist_wheel Traceback (most recent call last): File "setup.py", line 18, in <module> import onnx_graphsurgeon File "/tmp/TensorRT/tools/onnx-graphsurgeon/onnx_graphsurgeon/__init__.py", line 1, in <module> from onnx_graphsurgeon.exporters.onnx_exporter import export_onnx File "/tmp/TensorRT/tools/onnx-graphsurgeon/onnx_graphsurgeon/exporters/onnx_exporter.py", line 18, in <module> import onnx ModuleNotFoundError: No module named 'onnx' Makefile:26: recipe for target 'build' failed make: *** [build] Error 1

Can you please me further on this.

nvyihengz commented 1 year ago

DALI is complaining clang and onnx being missing. Did you run the whole script? If so, could you try sudo python3 -m pip install clang onnx?

khushbuKinara commented 1 year ago

@nvyihengz I tried and able to install all dependencies, but still getting error for DALI.

Building RNN-T harness... Traceback (most recent call last): File "<string>", line 1, in <module> ModuleNotFoundError: No module named 'nvidia' Traceback (most recent call last): File "<string>", line 1, in <module> ModuleNotFoundError: No module named 'nvidia' CMake Error at CMakeLists.txt:469 (message): Error acquiring DALI. Verify, that you have DALI whl installed Call Stack (most recent call first): CMakeLists.txt:508 (get_dali_paths)

nvyihengz commented 1 year ago

If pip/dpkg shows DALI is installed, this would mean the DALI path is not properly read by the CMakefile. Probably due to a non-default lib path etc. Could you check if DALI is correctly installed?

Since DALI takes quite some time to install, it can also be the case that the installation failed silently. In that case I would need you redirect the output of the dependency installation script to further debug.

khushbuKinara commented 1 year ago

Yes I overlooked the error, DALI is not yet installed. Here is the log from the installation:

-- Generating done -- Build files have been written to: /usr/local/DALI/build [ 0%] Running C++ protocol buffer compiler on proto/dali.proto Scanning dependencies of target cocoapi [ 1%] Building C object CMakeFiles/cocoapi.dir/third_party/cocoapi/common/maskApi.c.o Scanning dependencies of target DALI_PROTO [ 1%] Building CXX object dali/pipeline/CMakeFiles/DALI_PROTO.dir/dali.pb.cc.o [ 1%] Linking C static library libcocoapi.a [ 1%] Built target cocoapi Scanning dependencies of target ffts_static [ 1%] Building C object third_party/ffts/CMakeFiles/ffts_static.dir/src/ffts.c.o [ 1%] Building C object third_party/ffts/CMakeFiles/ffts_static.dir/src/ffts_chirp_z.c.o [ 2%] Building C object third_party/ffts/CMakeFiles/ffts_static.dir/src/ffts_nd.c.o [ 2%] Building C object third_party/ffts/CMakeFiles/ffts_static.dir/src/ffts_real.c.o [ 2%] Building C object third_party/ffts/CMakeFiles/ffts_static.dir/src/ffts_real_nd.c.o [ 2%] Building C object third_party/ffts/CMakeFiles/ffts_static.dir/src/ffts_transpose.c.o [ 3%] Building C object third_party/ffts/CMakeFiles/ffts_static.dir/src/ffts_trig.c.o [ 3%] Building C object third_party/ffts/CMakeFiles/ffts_static.dir/src/ffts_static.c.o /usr/local/DALI/third_party/ffts/src/ffts_static.c:239:36: warning: ‘ffts_constants_inv_64f’ defined but not used [-Wunused-const-variable=] static const FFTS_ALIGN(16) double ffts_constants_inv_64f[16] = { ^~~~~~~~~~~~~~~~~~~~~~ /usr/local/DALI/third_party/ffts/src/ffts_static.c:195:36: warning: ‘ffts_constants_64f’ defined but not used [-Wunused-const-variable=] static const FFTS_ALIGN(16) double ffts_constants_64f[16] = { ^~~~~~~~~~~~~~~~~~ /usr/local/DALI/third_party/ffts/src/ffts_static.c:141:36: warning: ‘ffts_constants_small_inv_64f’ defined but not used [-Wunused-const-variable=] static const FFTS_ALIGN(16) double ffts_constants_small_inv_64f[24] = { ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/local/DALI/third_party/ffts/src/ffts_static.c:77:36: warning: ‘ffts_constants_small_64f’ defined but not used [-Wunused-const-variable=] static const FFTS_ALIGN(16) double ffts_constants_small_64f[24] = { ^~~~~~~~~~~~~~~~~~~~~~~~ [ 3%] Built target DALI_PROTO [ 3%] Running cuda.h stub generator Traceback (most recent call last): File "/home/jetson/.local/lib/python3.8/site-packages/clang/cindex.py", line 4136, in register_function func = getattr(lib, item[0]) File "/usr/lib/python3.8/ctypes/__init__.py", line 386, in __getattr__ func = self.__getitem__(name) File "/usr/lib/python3.8/ctypes/__init__.py", line 391, in __getitem__ func = self._FuncPtr((name_or_ordinal, self)) AttributeError: /usr/lib/aarch64-linux-gnu/libclang-16.so: undefined symbol: clang_CXXMethod_isDeleted

I have installed clang using the above command you mentioned(pip install clang),

$ clang --version clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final) Target: aarch64-unknown-linux-gnu Thread model: posix InstalledDir: /usr/bin

while running dependency script its looking for libclang-16.so, but I have libclang-6.0.so.1 so I linked it to that $ ls -lrth /usr/lib/aarch64-linux-gnu/libclang* -rw-r--r-- 1 root root 23M 4月 6 2018 /usr/lib/aarch64-linux-gnu/libclang-6.0.so.1 lrwxrwxrwx 1 root root 17 4月 6 2018 /usr/lib/aarch64-linux-gnu/libclang-6.0.so -> libclang-6.0.so.1 lrwxrwxrwx 1 root root 44 5月 30 23:41 /usr/lib/aarch64-linux-gnu/libclang.so -> /usr/lib/aarch64-linux-gnu/libclang-6.0.so.1 lrwxrwxrwx 1 root root 44 6月 1 01:07 /usr/lib/aarch64-linux-gnu/libclang-16.so -> /usr/lib/aarch64-linux-gnu/libclang-6.0.so.1

nvyihengz commented 1 year ago

DALI is a separately maintained software hosted at https://github.com/NVIDIA/DALI. My guess is you are using a clang that is not version compatible with the legacy DALI version. I would suggest consult the DALI team. Thanks.

khushbuKinara commented 1 year ago

Thanks for the help @nvyihengz. Really appreciated.