[Build] ONNX Runtime build fails OOM (v1.20.0)

mc-nv commented 2 hours ago

Describe the issue

Getting issue trying to compile against rel-1.20.0 branch. We are getting out of memory issue, for both Linux and Windows platforms.

windows config (64GB RAM):

BUILDTOOLS_VERSION:17.12.35506.116 
CMAKE_VERSION:3.30.5 
CUDA_VERSION:12.6.2 
CUDNN_VERSION:9.5.1.17 
PYTHON_VERSION:3.12.3 
TENSORRT_VERSION:10.6.0.26 
VCPGK_VERSION:2024.03.19

LInux (64GB RAM):

CMAKE_VERSION:3.28.3
CUDA_VERSION:12.6.2 
CUDNN_VERSION:9.5.1.17 
PYTHON_VERSION:3.12.3 
TENSORRT_VERSION:10.6.0.26

Urgency

ASAP

Target platform

Linux, Windows

Build script

Windows:

onnxruntime/tools/ci_build/build.py `
   --cmake_generator "Visual Studio 17 2022" `
   --config Release `
   --cmake_extra_defines "CMAKE_CUDA_ARCHITECTURES=75;80;86;90" `
   --skip_submodule_sync `
   --parallel `
   --build_shared_lib `
   --compile_no_warning_as_error `
   --skip_tests `
   --update `
   --build `
   --build_dir /workspace/build `
   --use_cuda `
   --cuda_home ${env:CUDA_PATH} `
   --cudnn_home ${env:CUDA_PATH} `
   --use_tensorrt --tensorrt_home "/tensorrt" ; `

linux:

./build.sh \
  --config Release \
  --skip_submodule_sync \
  --parallel \
  --build_shared_lib     \
  --compile_no_warning_as_error \
  --build_dir /workspace/build \
  --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES='75;80;86;90'  \
  --update \
  --build \
  --use_cuda \
  --cuda_home "/usr/local/cuda" \
  --cudnn_home "/usr" \
  --use_tensorrt \
  --use_tensorrt_builtin_parser \
  --tensorrt_home "/usr/src/tensorrt" \
  --allow_running_as_root \
  --use_openvino CPU

Error / output

No error, container fails out of memory.

Visual Studio Version

No response

GCC / Compiler Version

No response

mc-nv commented 2 hours ago

@snnn for viz

snnn commented 2 hours ago

Use " --parallel \<n>" to reduce the parallelism.

snnn commented 2 hours ago

It is more about how much memory you have for each CPU core than how much memory you have in total.

mc-nv commented 1 hour ago

See linux build uses --parallel and it heavy machines where we never see issue building ONNX Runtime.

snnn commented 1 hour ago

Sorry my response was eaten by a part because of formatting. I meant, put a number there after "--parallel", to limit the number of concurrent processes. Let's say you have 64GB memory and 16 CPUs. By default make/msbuild will create at most 16 subprocesses. Since we do not know if 4GB is enough for one compiler process, sometimes we might need to manually adjust the parallelism to avoid OOM.

mc-nv commented 1 hour ago

Sounds like a suggestion to have 8Gb per process, am I right?

mc-nv commented 1 hour ago

Sorry my response was eaten by a part because of formatting. I meant, put a number there after "--parallel", to limit the number of concurrent processes. Let's say you have 64GB memory and 16 CPUs. By default make/msbuild will create at most 8 subprocesses. Since we do not know if 4GB is enough for one compiler process, sometimes we might need to manually adjust the parallelism to avoid OOM.

See in my scenario we don't set limit to parallel jobs and using default which "1" by default: https://github.com/microsoft/onnxruntime/blob/main/tools/ci_build/build.py#L171

What will be the reason to set limit to 2 or 4 if we failing with OOO using single process?

snnn commented 55 minutes ago

Actually the default is not one. If the optional value is 0 or unspecified, it is interpreted as the number of CPUs. As you know how much CPUs the machine has, you may start with dividing it by half. For example, if we think the default value is 16, we try 8 first. If the error still exists, we decrease it further. Eventually it will pass because 64GB is definitely enough for one single compiler processs.

snnn commented 53 minutes ago

You may also need to tune the "--nvcc_threads" parameter. To be safe, you can set it to one.

mc-nv commented 53 minutes ago

My windows build environment has 2 CPUs.

microsoft / onnxruntime