[Build] Docker build with enable_cuda_profiling failing

Describe the issue

I'm trying to build onnxruntime with cuda profiling enabled. I use the Dockerfile.cuda file and add in the --enable_cuda_profiling argument however the build fails. I've also tried this multiple times locally instead of using the dockerfile and it also fails. The build script is the Dockerfile.cuda from the rel-1.16.3 branch but with the --enable_cuda_profiling flag included. It fails somewhere on the build call. If I run this locally, it fails on flash_fwd_split_hdim128_fp16_sm80.cu here

Urgency

No response

Target platform

Linux 5.15.0-89-generic x86_64

Build script

FROM nvcr.io/nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04
ENV     DEBIAN_FRONTEND=noninteractive
MAINTAINER Changming Sun "chasun@microsoft.com"
ADD . /code

ENV PATH /usr/local/nvidia/bin:/usr/local/cuda/bin:${PATH}
RUN apt-get update && apt-get install -y --no-install-recommends python3-dev ca-certificates g++ python3-numpy gcc make git python3-setuptools python3-wheel python3-packaging python3-pip aria2 && aria2c -q -d /tmp -o cmake-3.26.3-linux-x86_64.tar.gz https://github.com/Kitware/CMake/releases/download/v3.26.3/cmake-3.26.3-linux-x86_64.tar.gz && tar -zxf /tmp/cmake-3.26.3-linux-x86_64.tar.gz --strip=1 -C /usr

RUN cd /code && python3 -m pip install -r tools/ci_build/github/linux/docker/inference/x64/python/cpu/scripts/requireme\
nts.txt && /bin/bash ./build.sh --allow_running_as_root --skip_submodule_sync --cuda_home /usr/local/cuda --cudnn_home /usr/lib/x86_64-linux-gnu/ --use_cuda --enable_cuda_profiling --config Release --build_wheel --update --build --parallel --cmake_extra_defines ONNXRUNTIME_VERSION=$(cat ./VERSION_NUMBER) 'CMAKE_CUDA_ARCHITECTURES=52;60;61;70;75;86'

FROM nvcr.io/nvidia/cuda:12.1.1-cudnn8-runtime-ubuntu22.04
ENV     DEBIAN_FRONTEND=noninteractive
COPY --from=0 /code/build/Linux/Release/dist /root
COPY --from=0 /code/dockerfiles/LICENSE-IMAGE.txt /code/LICENSE-IMAGE.txt
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends libstdc++6 ca-certificates python3-setuptools python3-wheel python3-pip unattended-upgrades && unattended-upgrade && python3 -m pip install /root/*.whl && rm -rf /root/*.whl

Error / output

[+] Building 642.6s (10/13)                                                                                                             
 => [internal] load build definition from Dockerfile.cuda                                                                                                                                                                                                                  0.2s
 => => transferring dockerfile: 1.96kB                                                                                                                                                                                                                                     0.0s
 => [internal] load .dockerignore                                                                                                                                                                                                                                          0.4s
 => => transferring context: 72B                                                                                                                                                                                                                                           0.2s
 => [internal] load metadata for nvcr.io/nvidia/cuda:12.1.1-cudnn8-runtime-ubuntu22.04                                                                                                                                                                                     1.2s
 => [internal] load metadata for nvcr.io/nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04                                                                                                                                                                                       2.1s
 => [internal] load build context                                                                                                                                                                                                                                        277.1s
 => => transferring context: 22.99GB                                                                                                                                                                                                                                     276.3s
 => CACHED [stage-0 1/4] FROM nvcr.io/nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04@sha256:21196d81f56b48dbee70494d5f10322e1a77cc47ffe202a3bf68eab81533c20f                                                                                                                  0.0s
 => CACHED [stage-1 1/4] FROM nvcr.io/nvidia/cuda:12.1.1-cudnn8-runtime-ubuntu22.04@sha256:f4d8e1264366940438f0353da6f289c7bef069d993d111f8106086ccd18c4a30                                                                                                                0.0s
 => [stage-0 2/4] ADD . /code                                                                                                                                                                                                                                            228.3s
 => [stage-0 3/4] RUN apt-get update && apt-get install -y --no-install-recommends python3-dev ca-certificates g++ python3-numpy gcc make git python3-setuptools python3-wheel python3-packaging python3-pip aria2 && aria2c -q -d /tmp -o cmake-3.26.3-linux-x86_64.tar  83.7s
 => ERROR [stage-0 4/4] RUN cd /code && python3 -m pip install -r tools/ci_build/github/linux/docker/inference/x64/python/cpu/scripts/requirements.txt && /bin/bash ./build.sh --allow_running_as_root --skip_submodule_sync --cuda_home /usr/local/cuda --cudnn_home /u  50.1s
------                                                              
 > [stage-0 4/4] RUN cd /code && python3 -m pip install -r tools/ci_build/github/linux/docker/inference/x64/python/cpu/scripts/requirements.txt && /bin/bash ./build.sh --allow_running_as_root --skip_submodule_sync --cuda_home /usr/local/cuda --cudnn_home /usr/lib/x86_64-l
inux-gnu/ --use_cuda --enable_cuda_profiling --config Release --build_wheel --update --build --parallel --cmake_extra_defines ONNXRUNTIME_VERSION=$(cat ./VERSION_NUMBER) 'CMAKE_CUDA_ARCHITECTURES=52;60;61;70;75;86':                                                         
#8 1.302 Ignoring numpy: markers 'python_version >= "3.11"' don't match your environment                                                
#8 1.309 Collecting onnx                                                                                                                
#8 1.309   Cloning http://github.com/onnx/onnx.git (to revision e2525550194ce3d8a2c4a3af451c9d9b3ae6650e) to /tmp/pip-install-kalf2t6q/onnx_9d9e9b391ba3498d93d5e7475c0324f1                                                                                                    
#8 1.312   Running command git clone --filter=blob:none --quiet http://github.com/onnx/onnx.git /tmp/pip-install-kalf2t6q/onnx_9d9e9b391ba3498d93d5e7475c0324f1
#8 10.23   warning: redirecting to https://github.com/onnx/onnx.git/                                          
#8 15.06   Running command git rev-parse -q --verify 'sha^e2525550194ce3d8a2c4a3af451c9d9b3ae6650e'
#8 15.06   Running command git fetch -q http://github.com/onnx/onnx.git e2525550194ce3d8a2c4a3af451c9d9b3ae6650e                        
#8 15.64   Running command git checkout -q e2525550194ce3d8a2c4a3af451c9d9b3ae6650e
#8 16.03   warning: redirecting to https://github.com/onnx/onnx.git/                                                                    
#8 16.86   Resolved http://github.com/onnx/onnx.git to commit e2525550194ce3d8a2c4a3af451c9d9b3ae6650e
#8 16.86   Running command git submodule update --init --recursive -q                                                                   
#8 24.11   Installing build dependencies: started
#8 27.49   Installing build dependencies: finished with status 'done'                                                                   
#8 27.50   Getting requirements to build wheel: started        
#8 27.71   Getting requirements to build wheel: finished with status 'done'                                                             
#8 27.80   Installing backend dependencies: started      
#8 29.27   Installing backend dependencies: finished with status 'done'                                       
#8 29.27   Preparing metadata (pyproject.toml): started
#8 30.35   Preparing metadata (pyproject.toml): finished with status 'done'
#8 30.84 Collecting numpy==1.21.6                 
#8 30.94   Downloading numpy-1.21.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (15.9 MB)                                
#8 39.91      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15.9/15.9 MB 1.6 MB/s eta 0:00:00
#8 40.62 Collecting mypy                                                                                                                
#8 40.63   Downloading mypy-1.7.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.5 MB)
#8 42.80      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.5/12.5 MB 6.1 MB/s eta 0:00:00
#8 42.96 Collecting pytest                                                                                                              
#8 42.98   Downloading pytest-7.4.3-py3-none-any.whl (325 kB)                                                                           
#8 43.04      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 325.1/325.1 KB 5.0 MB/s eta 0:00:00
#8 43.05 Requirement already satisfied: setuptools>=41.4.0 in /usr/lib/python3/dist-packages (from -r tools/ci_build/github/linux/docker/inference/x64/python/cpu/scripts/requirements.txt (line 5)) (59.6.0)                                                                   
#8 43.05 Requirement already satisfied: wheel in /usr/lib/python3/dist-packages (from -r tools/ci_build/github/linux/docker/inference/x64/python/cpu/scripts/requirements.txt (line 6)) (0.37.1)                                                                                
#8 43.40 Collecting protobuf==3.20.2                                                                                                    
#8 43.46   Downloading protobuf-3.20.2-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.1 MB)
#8 43.94      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 2.1 MB/s eta 0:00:00                                                  
#8 44.05 Collecting sympy==1.12                           
#8 44.10   Downloading sympy-1.12-py3-none-any.whl (5.7 MB)
#8 45.47      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.7/5.7 MB 4.2 MB/s eta 0:00:00                                    
#8 45.54 Collecting flatbuffers
#8 45.55   Downloading flatbuffers-23.5.26-py2.py3-none-any.whl (26 kB)                                                                                                                                                                                                         
#8 45.66 Collecting mpmath>=0.19                                                                                                                                                                                                                                                
#8 45.69   Downloading mpmath-1.3.0-py3-none-any.whl (536 kB)                                                                           
#8 45.83      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 KB 4.0 MB/s eta 0:00:00
#8 45.92 Collecting tomli>=1.1.0                                                                                                                                                                                                                                        [0/1796]
#8 45.97   Downloading tomli-2.0.1-py3-none-any.whl (12 kB)
#8 46.09 Collecting mypy-extensions>=1.0.0
#8 46.12   Downloading mypy_extensions-1.0.0-py3-none-any.whl (4.7 kB)
#8 46.27 Collecting typing-extensions>=4.1.0
#8 46.28   Downloading typing_extensions-4.8.0-py3-none-any.whl (31 kB)
#8 46.42 Collecting exceptiongroup>=1.0.0rc8
#8 46.45   Downloading exceptiongroup-1.2.0-py3-none-any.whl (16 kB) 
#8 46.55 Collecting iniconfig
#8 46.58   Downloading iniconfig-2.0.0-py3-none-any.whl (5.9 kB)
#8 46.62 Requirement already satisfied: packaging in /usr/lib/python3/dist-packages (from pytest->-r tools/ci_build/github/linux/docker/inference/x64/python/cpu/scripts/requirements.txt (line 4)) (21.3)
#8 46.69 Collecting pluggy<2.0,>=0.12
#8 46.70   Downloading pluggy-1.3.0-py3-none-any.whl (18 kB)
#8 46.77 Building wheels for collected packages: onnx
#8 46.78   Building wheel for onnx (pyproject.toml): started
#8 47.30   Building wheel for onnx (pyproject.toml): finished with status 'error'
#8 47.32   error: subprocess-exited-with-error
#8 47.32   
#8 47.32   × Building wheel for onnx (pyproject.toml) did not run successfully.
#8 47.32   │ exit code: 1
#8 47.32   ╰─> [77 lines of output]
#8 47.32       running bdist_wheel
#8 47.32       running build
#8 47.32       running build_py
#8 47.32       running create_version
#8 47.32       running cmake_build
#8 47.32       -- The C compiler identification is GNU 11.4.0
#8 47.32       -- The CXX compiler identification is GNU 11.4.0
#8 47.32       -- Detecting C compiler ABI info
#8 47.32       -- Detecting C compiler ABI info - done
#8 47.32       -- Check for working C compiler: /usr/bin/cc - skipped
#8 47.32       -- Detecting C compile features
#8 47.32       -- Detecting C compile features - done
#8 47.32       -- Detecting CXX compiler ABI info
#8 47.32       -- Detecting CXX compiler ABI info - done
#8 47.32       -- Check for working CXX compiler: /usr/bin/c++ - skipped
#8 47.32       -- Detecting CXX compile features
#8 47.32       -- Detecting CXX compile features - done
#8 47.32       -- Found PythonInterp: /usr/bin/python3 (found version "3.10.12")
#8 47.32       -- Found PythonLibs: /usr/lib/x86_64-linux-gnu/libpython3.10.so (found version "3.10.12")
#8 47.32       Generated: /tmp/pip-install-kalf2t6q/onnx_9d9e9b391ba3498d93d5e7475c0324f1/.setuptools-cmake-build/onnx/onnx-ml.proto
#8 47.32       CMake Error at CMakeLists.txt:303 (message):
#8 47.32         Protobuf compiler not found
#8 47.32       Call Stack (most recent call first):
#8 47.32         CMakeLists.txt:334 (relative_protobuf_generate_cpp) 
#8 47.32       
#8 47.32       
#8 47.32       -- Configuring incomplete, errors occurred!
#8 47.32       Traceback (most recent call last):
#8 47.32         File "/usr/lib/python3/dist-packages/pip/_vendor/pep517/in_process/_in_process.py", line 363, in <module>
#8 47.32           main()
#8 47.32         File "/usr/lib/python3/dist-packages/pip/_vendor/pep517/in_process/_in_process.py", line 345, in main
#8 47.32           json_out['return_val'] = hook(**hook_input['kwargs'])
#8 47.32         File "/usr/lib/python3/dist-packages/pip/_vendor/pep517/in_process/_in_process.py", line 261, in build_wheel
#8 47.32           return _build_backend().build_wheel(wheel_directory, config_settings,
#8 47.32         File "/usr/lib/python3/dist-packages/setuptools/build_meta.py", line 230, in build_wheel
#8 47.32           return self._build_with_temp_dir(['bdist_wheel'], '.whl',
#8 47.32         File "/usr/lib/python3/dist-packages/setuptools/build_meta.py", line 215, in _build_with_temp_dir
#8 47.32           self.run_setup()
#8 47.32         File "/usr/lib/python3/dist-packages/setuptools/build_meta.py", line 267, in run_setup
#8 47.32           super(_BuildMetaLegacyBackend,
#8 47.32         File "/usr/lib/python3/dist-packages/setuptools/build_meta.py", line 158, in run_setup
#8 47.32           exec(compile(code, __file__, 'exec'), locals())
#8 47.32         File "setup.py", line 342, in <module>
#8 47.32           setuptools.setup(
#8 47.32         File "/usr/lib/python3/dist-packages/setuptools/__init__.py", line 153, in setup
#8 47.32           return distutils.core.setup(**attrs)
#8 47.32         File "/usr/lib/python3/dist-packages/setuptools/_distutils/core.py", line 148, in setup
#8 47.32           return run_commands(dist)
#8 47.32         File "/usr/lib/python3/dist-packages/setuptools/_distutils/core.py", line 163, in run_commands
#8 47.32           dist.run_commands()
#8 47.32         File "/usr/lib/python3/dist-packages/setuptools/_distutils/dist.py", line 967, in run_commands
#8 47.32           self.run_command(cmd)
#8 47.32         File "/usr/lib/python3/dist-packages/setuptools/_distutils/dist.py", line 986, in run_command
#8 47.32           cmd_obj.run()
#8 47.32         File "/usr/lib/python3/dist-packages/wheel/bdist_wheel.py", line 299, in run
#8 47.32           self.run_command('build')
#8 47.32         File "/usr/lib/python3/dist-packages/setuptools/_distutils/cmd.py", line 313, in run_command
#8 47.32           self.distribution.run_command(command)
#8 47.32         File "/usr/lib/python3/dist-packages/setuptools/_distutils/dist.py", line 986, in run_command
#8 47.32           cmd_obj.run()
#8 47.32         File "/usr/lib/python3/dist-packages/setuptools/_distutils/command/build.py", line 135, in run
#8 47.32           self.run_command(cmd_name)
#8 47.32         File "/usr/lib/python3/dist-packages/setuptools/_distutils/cmd.py", line 313, in run_command
#8 47.32           self.distribution.run_command(command)
#8 47.32         File "/usr/lib/python3/dist-packages/setuptools/_distutils/dist.py", line 986, in run_command
#8 47.32           cmd_obj.run()
#8 47.32         File "setup.py", line 236, in run
#8 47.32           self.run_command("cmake_build")
#8 47.32         File "/usr/lib/python3/dist-packages/setuptools/_distutils/cmd.py", line 313, in run_command
#8 47.32           self.distribution.run_command(command)
#8 47.32         File "/usr/lib/python3/dist-packages/setuptools/_distutils/dist.py", line 986, in run_command
#8 47.32           cmd_obj.run()
#8 47.32         File "setup.py", line 222, in run
#8 47.32           subprocess.check_call(cmake_args)
#8 47.32         File "/usr/lib/python3.10/subprocess.py", line 369, in check_call
#8 47.32           raise CalledProcessError(retcode, cmd)
#8 47.32       subprocess.CalledProcessError: Command '['/usr/bin/cmake', '-DPYTHON_INCLUDE_DIR=/usr/include/python3.10', '-DPYTHON_EXECUTABLE=/usr/bin/python3', '-DBUILD_ONNX_PYTHON=ON', '-DCMAKE_EXPORT_COMPILE_COMMANDS=ON', '-DONNX_NAMESPACE=onnx', '-DPY_EXT_SUFFIX=.cpy
thon-310-x86_64-linux-gnu.so', '-DCMAKE_BUILD_TYPE=Release', '-DONNX_ML=1', '/tmp/pip-install-kalf2t6q/onnx_9d9e9b391ba3498d93d5e7475c0324f1']' returned non-zero exit status 1.
#8 47.32       [end of output]
#8 47.32   
#8 47.32   note: This error originates from a subprocess, and is likely not a problem with pip.
#8 47.32   ERROR: Failed building wheel for onnx
#8 47.32 Failed to build onnx
#8 47.32 ERROR: Could not build wheels for onnx, which is required to install pyproject.toml-based projects
------
process "/bin/sh -c cd /code && python3 -m pip install -r tools/ci_build/github/linux/docker/inference/x64/python/cpu/scripts/requirements.txt && /bin/bash ./build.sh --allow_running_as_root --skip_submodule_sync --cuda_home /usr/local/cuda --cudnn_home /usr/lib/x86_64-li
nux-gnu/ --use_cuda --enable_cuda_profiling --config Release --build_wheel --update --build --parallel --cmake_extra_defines ONNXRUNTIME_VERSION=$(cat ./VERSION_NUMBER) 'CMAKE_CUDA_ARCHITECTURES=52;60;61;70;75;86'" did not complete successfully: exit code: 1

Visual Studio Version

No response

GCC / Compiler Version

No response

microsoft / onnxruntime