mlcommons / inference_results_v2.0

This repository contains the results and code for the MLPerf™ Inference v2.0 benchmark.
https://mlcommons.org/en/inference-datacenter-20/
Apache License 2.0
9 stars 12 forks source link

[NVIDIA] TensorRT bindings for Python 3.8 broken? #2

Closed psyhtest closed 2 years ago

psyhtest commented 2 years ago

I've been trying to preprocess the ImageNet validation dataset on a Jetson AGX Xavier with JetPack 4.6. To my surprise, what used to work with the v1.0 release:

anton@xavier:~$ python3.8 -m pip install --upgrade pip
anton@xavier:~$ python3.8 -m pip install setuptools --upgrade
anton@xavier:~$ python3.8 -m pip install opencv-python six typing

anton@xavier:~$ cd /datasets/inference_results_v1.0/closed/NVIDIA
anton@xavier:/datasets/inference_results_v1.0/closed/NVIDIA$ mkdir build
anton@xavier:/datasets/inference_results_v1.0/closed/NVIDIA$ mkdir build/data
anton@xavier:/datasets/inference_results_v1.0/closed/NVIDIA$ cp -r /datasets/dataset-imagenet-ilsvrc2012-val build/data/imagenet
anton@xavier:/datasets/inference_results_v1.0/closed/NVIDIA$ python3.8 code/resnet50/tensorrt/preprocess_data.py

(and still does!), does not work with the v2.0 release.

psyhtest commented 2 years ago

On the one hand, the Python 3.8 error is understood because TensorRT is only natively installed for Python 3.6. On the other hand, TensorRT should not be required to preprocess any data.

psyhtest commented 2 years ago

I've got similar errors with the v1.1 release:

psyhtest commented 2 years ago

The same problem seems to prevent automatic platform detection:

katya@xavier:/datasets/inference_results_v1.1/closed/NVIDIA$ python3.6 scripts/get_system_id.py
Traceback (most recent call last):
  File "scripts/get_system_id.py", line 21, in <module>
    from code.common import get_system
  File "/datasets/inference_results_v1.1/closed/NVIDIA/code/__init__.py", line 22, in <module>
    from code.common import logging
  File "/datasets/inference_results_v1.1/closed/NVIDIA/code/common/__init__.py", line 33, in <module>
    from code.common.system_list import KnownSystems, MIGConfiguration
  File "/datasets/inference_results_v1.1/closed/NVIDIA/code/common/system_list.py", line 22, in <module>
    from code.common.constants import CPUArch
  File "/datasets/inference_results_v1.1/closed/NVIDIA/code/common/constants.py", line 15
    from __future__ import annotations
    ^
SyntaxError: future feature annotations is not defined

katya@xavier:/datasets/inference_results_v1.1/closed/NVIDIA$ python3.8 scripts/get_system_id.py
Traceback (most recent call last):
  File "scripts/get_system_id.py", line 21, in <module>
    from code.common import get_system
  File "/datasets/inference_results_v1.1/closed/NVIDIA/code/__init__.py", line 22, in <module>
    from code.common import logging
  File "/datasets/inference_results_v1.1/closed/NVIDIA/code/common/__init__.py", line 33, in <module>
    from code.common.system_list import KnownSystems, MIGConfiguration
  File "/datasets/inference_results_v1.1/closed/NVIDIA/code/common/system_list.py", line 22, in <module>
    from code.common.constants import CPUArch
  File "/datasets/inference_results_v1.1/closed/NVIDIA/code/common/constants.py", line 27, in <module>
    import tensorrt as trt
ModuleNotFoundError: No module named 'tensorrt'

so it has to be manually forced:

katya@xavier:/datasets/inference_results_v1.1/closed/NVIDIA$ git diff Makefile
diff --git a/closed/NVIDIA/Makefile b/closed/NVIDIA/Makefile
index 4e371a92..676bf377 100644
--- a/closed/NVIDIA/Makefile
+++ b/closed/NVIDIA/Makefile
@@ -36,7 +36,7 @@ SUBMITTER ?= NVIDIA
 SYSTEM_NAME ?= $(shell $(PYTHON3_CMD) scripts/get_system_id.py 2> /dev/null)
 TARGET_X86_64 := 0
 TARGET_AARCH64 := 0
-IS_XAVIER := 0
+IS_XAVIER := 1
 ifeq ($(ARCH), x86_64)
     TARGET_X86_64 = 1
 endif
psyhtest commented 2 years ago

I'm beginning to suspect that the installation of dependencies has not been completely successful. Will try to rerun it.

The README says:

JetPack 4.6 (21.08 Jetson CUDA-X AI Developer Preview)

  • Includes TensorRT 8.0.1.6
  • Includes cuDNN 8.2.3.8 (If you are using the production Jetpack 4.6, this is the only package you need to install before running the dependency script below)
  • Dependencies can be installed by running the script located at closed/NVIDIA/scripts/install_xavier_dependencies.sh

Is this up-to-date? JetPack 5.0 is supposed to be used in the v2.0 round, and the script is not available.

psyhtest commented 2 years ago

Luckily, this script is available in e.g. Azure's submission... alongside with similar scripts for Orin:

anton@xavier:/datasets/inference_results_v2.0/closed/Azure$ ls -la scripts/install_*
-rw-rw-r-- 1 anton krai 2412 May 16 14:40 scripts/install_orin_auto_dependencies_internal.sh
-rw-rw-r-- 1 anton krai 7050 May 16 14:40 scripts/install_orin_auto_dependencies.sh
-rw-rw-r-- 1 anton krai 1619 May 16 14:40 scripts/install_orin_jetson_dependencies_internal.sh
-rw-rw-r-- 1 anton krai 6842 May 16 14:40 scripts/install_orin_jetson_dependencies.sh
-rw-rw-r-- 1 anton krai 7029 May 16 14:40 scripts/install_xavier_dependencies.sh
psyhtest commented 2 years ago

The install_xavier_dependencies.sh script is practically the same as for v1.1:

anton@xavier:~$ diff \
/datasets/inference_results_v1.1/closed/NVIDIA/scripts/install_xavier_dependencies.sh \
/datasets/inference_results_v2.0/closed/Azure/scripts/install_xavier_dependencies.sh
2c2
< # Copyright (c) 2021, NVIDIA CORPORATION.  All rights reserved.
---
> # Copyright (c) 2022, NVIDIA CORPORATION.  All rights reserved.

I suspect it might need to be updated e.g. TensorRT should probably be v8.0, not v7.2.

psyhtest commented 2 years ago

It appears the v1.1/v2.0 version of install_xavier_dependencies.sh is incomplete: it attempts to switch to Python 3.8 on JetPack 4.6, but does not build Python 3.8 bindings for TensorRT. I've attempted to do it myself following instructions here, but hit a brick wall. (For starters, there's no aarch64 deb package for Python 3.8.)

psyhtest commented 2 years ago

I've gone a bit further with the following steps:

export EXT_PATH=~/external
mkdir -p $EXT_PATH
cd $EXT_PATH

git clone https://github.com/pybind/pybind11.git

wget https://www.python.org/ftp/python/3.8.13/Python-3.8.13.tgz
tar xvzf Python-3.8.13.tgz
mkdir python3.8
cp -r Python-3.8.13/Include/ python3.8/include
cp /usr/include/aarch64-linux-gnu/python3.8/pyconfig.h python3.8/include/

git clone https://github.com/NVIDIA/TensorRT
cd TensorRT
git checkout release/8.0
cd python
EXT_PATH=~/external CPATH=~/external/pybind11/include TRT_OSSPATH=~/external/TensorRT \
PYTHON_MAJOR_VERSION=3 PYTHON_MINOR_VERSION=8 TARGET_ARCHITECTURE=aarch64 \
./build.sh

The build log, however, is not clean:

~/external/TensorRT/python/build ~/external/TensorRT/python
ForwardDeclarations.h utils.h
-- Configuring done
-- Generating done
-- Build files have been written to: /home/anton/external/TensorRT/python/build
[  8%] Building CXX object CMakeFiles/tensorrt.dir/src/infer/pyCore.cpp.o
[ 33%] Building CXX object CMakeFiles/tensorrt.dir/src/infer/pyInt8.cpp.o
[ 33%] Building CXX object CMakeFiles/tensorrt.dir/src/infer/pyFoundationalTypes.cpp.o
[ 33%] Building CXX object CMakeFiles/tensorrt.dir/src/infer/pyAlgorithmSelector.cpp.o
[ 41%] Building CXX object CMakeFiles/tensorrt.dir/src/parsers/pyOnnx.cpp.o
[ 50%] Building CXX object CMakeFiles/tensorrt.dir/src/infer/pyPlugin.cpp.o
[ 58%] Building CXX object CMakeFiles/tensorrt.dir/src/infer/pyGraph.cpp.o
[ 66%] Building CXX object CMakeFiles/tensorrt.dir/src/parsers/pyCaffe.cpp.o
[ 75%] Building CXX object CMakeFiles/tensorrt.dir/src/pyTensorRT.cpp.o
[ 83%] Building CXX object CMakeFiles/tensorrt.dir/src/utils.cpp.o
[ 91%] Building CXX object CMakeFiles/tensorrt.dir/src/parsers/pyUff.cpp.o
In file included from /home/anton/external/TensorRT/python/include/ForwardDeclarations.h:18:0,
                 from /home/anton/external/TensorRT/python/src/infer/pyInt8.cpp:18:
/home/anton/external/TensorRT/python/src/infer/pyInt8.cpp: In member function ‘virtual const void* tensorrt::pyIInt8LegacyCalibrator::readHistogramCache(std::size_t&)’:
/home/anton/external/pybind11/include/pybind11/pybind11.h:2742:70: error: void value not ignored as it ought to be
             return pybind11::detail::cast_safe<ret_type>(std::move(o));                           \
                                                                      ^
/home/anton/external/pybind11/include/pybind11/pybind11.h:2776:9: note: in expansion of macro ‘PYBIND11_OVERRIDE_IMPL’
         PYBIND11_OVERRIDE_IMPL(PYBIND11_TYPE(ret_type), PYBIND11_TYPE(cname), name, __VA_ARGS__); \
         ^~~~~~~~~~~~~~~~~~~~~~
/home/anton/external/pybind11/include/pybind11/pybind11.h:2835:5: note: in expansion of macro ‘PYBIND11_OVERRIDE_PURE_NAME’
     PYBIND11_OVERRIDE_PURE_NAME(                                                                  \
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/anton/external/TensorRT/python/src/infer/pyInt8.cpp:125:9: note: in expansion of macro ‘PYBIND11_OVERLOAD_PURE_NAME’
         PYBIND11_OVERLOAD_PURE_NAME(
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~
CMakeFiles/tensorrt.dir/build.make:133: recipe for target 'CMakeFiles/tensorrt.dir/src/infer/pyInt8.cpp.o' failed
make[2]: *** [CMakeFiles/tensorrt.dir/src/infer/pyInt8.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....

CMakeFiles/Makefile2:94: recipe for target 'CMakeFiles/tensorrt.dir/all' failed
make[1]: *** [CMakeFiles/tensorrt.dir/all] Error 2
Makefile:102: recipe for target 'all' failed
make: *** [all] Error 2
Generating python 3.8 bindings for TensorRT 8.0.1.6
~/external/TensorRT/python/packaging ~/external/TensorRT/python/build ~/external/TensorRT/python
~/external/TensorRT/python/build ~/external/TensorRT/python
/usr/local/lib/python3.8/dist-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
~/external/TensorRT/python

and even after successful installation:

$ python3 -m pip install build/dist/tensorrt-*.whl
Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Processing ./build/dist/tensorrt-8.0.1.6-cp38-none-linux_aarch64.whl
Installing collected packages: tensorrt
Successfully installed tensorrt-8.0.1.6

$ python3 -m pip show tensorrt
Name: tensorrt
Version: 8.0.1.6
Summary: A high performance deep learning inference library
Home-page: 
Author: NVIDIA
Author-email: 
License: Proprietary
Location: /home/anton/.local/lib/python3.8/site-packages
Requires: 
Required-by: 

importing tensorrt fails:

$ python3 -c "import tensorrt"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/anton/.local/lib/python3.8/site-packages/tensorrt/__init__.py", line 36, in <module>
    from .tensorrt import *
ModuleNotFoundError: No module named 'tensorrt.tensorrt'
psyhtest commented 2 years ago

Pinned down that line to a pybind11 commit after v2.6.2, which was recommended for Xavier. Checking out v2.6.2 makes it work.

So the complete sequence of instructions is:

export EXT_PATH=/tmp/tensorrt-bindings
rm -rf $EXT_PATH
mkdir $EXT_PATH
cd $EXT_PATH

git clone https://github.com/pybind/pybind11.git
cd pybind11
git checkout v2.6.2
cd $EXT_PATH

wget https://www.python.org/ftp/python/3.8.13/Python-3.8.13.tgz
tar xvzf Python-3.8.13.tgz
mkdir python3.8
cp -r Python-3.8.13/Include/ python3.8/include
cp /usr/include/aarch64-linux-gnu/python3.8/pyconfig.h python3.8/include/

git clone https://github.com/NVIDIA/TensorRT
cd TensorRT
git checkout release/8.0
cd python
CPATH=$EXT_PATH/pybind11/include TRT_OSSPATH=$EXT_PATH/TensorRT \
PYTHON_MAJOR_VERSION=3 PYTHON_MINOR_VERSION=8 TARGET=aarch64 \
./build.sh

python3 -m pip uninstall tensorrt -y
python3 -m pip install build/dist/tensorrt-*.whl
cd $EXT_PATH

python3 -c "import tensorrt"
nv-ananjappa commented 2 years ago

@psyhtest Are your problems on Xavier resolved now?

psyhtest commented 2 years ago

Yes, thanks @nv-ananjappa.

Zack0617 commented 2 years ago

Meet the same issue on Xavier NX Devkit with JetPack 4.6, can't solve following your instructions,

@psyhtest Could you give me more help, thank you!

importing tensorrt fails:

zack@zack-desktop:~/inference_results_v2.0/closed/NVIDIA$ python3 -c "import tensorrt"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/zack/.local/lib/python3.8/site-packages/tensorrt/__init__.py", line 36, in <module>
    from .tensorrt import *
ModuleNotFoundError: No module named 'tensorrt.tensorrt'

Pinned down that line to a pybind11 commit after v2.6.2, which was recommended for Xavier. Checking out v2.6.2 makes it work.

So the complete sequence of instructions is:

export EXT_PATH=/tmp/tensorrt-bindings
rm -rf $EXT_PATH
mkdir $EXT_PATH
cd $EXT_PATH

git clone https://github.com/pybind/pybind11.git
cd pybind11
git checkout v2.6.2
cd $EXT_PATH

wget https://www.python.org/ftp/python/3.8.13/Python-3.8.13.tgz
tar xvzf Python-3.8.13.tgz
mkdir python3.8
cp -r Python-3.8.13/Include/ python3.8/include
cp /usr/include/aarch64-linux-gnu/python3.8/pyconfig.h python3.8/include/

git clone https://github.com/NVIDIA/TensorRT
cd TensorRT
git checkout release/8.0
cd python
CPATH=$EXT_PATH/pybind11/include TRT_OSSPATH=$EXT_PATH/TensorRT \
PYTHON_MAJOR_VERSION=3 PYTHON_MINOR_VERSION=8 TARGET=aarch64 \
./build.sh

python3 -m pip uninstall tensorrt -y
python3 -m pip install build/dist/tensorrt-*.whl
cd $EXT_PATH

python3 -c "import tensorrt"
EdisonPricehan commented 1 year ago

Pinned down that line to a pybind11 commit after v2.6.2, which was recommended for Xavier. Checking out v2.6.2 makes it work.

So the complete sequence of instructions is:

export EXT_PATH=/tmp/tensorrt-bindings
rm -rf $EXT_PATH
mkdir $EXT_PATH
cd $EXT_PATH

git clone https://github.com/pybind/pybind11.git
cd pybind11
git checkout v2.6.2
cd $EXT_PATH

wget https://www.python.org/ftp/python/3.8.13/Python-3.8.13.tgz
tar xvzf Python-3.8.13.tgz
mkdir python3.8
cp -r Python-3.8.13/Include/ python3.8/include
cp /usr/include/aarch64-linux-gnu/python3.8/pyconfig.h python3.8/include/

git clone https://github.com/NVIDIA/TensorRT
cd TensorRT
git checkout release/8.0
cd python
CPATH=$EXT_PATH/pybind11/include TRT_OSSPATH=$EXT_PATH/TensorRT \
PYTHON_MAJOR_VERSION=3 PYTHON_MINOR_VERSION=8 TARGET=aarch64 \
./build.sh

python3 -m pip uninstall tensorrt -y
python3 -m pip install build/dist/tensorrt-*.whl
cd $EXT_PATH

python3 -c "import tensorrt"

Thanks a lot. The overall procedure works on my Jetson Nano 2Gb. The only thing I modified is inside the build.sh, where I changed the number of jobs (-j12) to 2 (-j2) since this Nano only has 4 cores, otherwise the building freezes.