Open mc-nv opened 2 hours ago
@snnn for viz
Use " --parallel \<n>" to reduce the parallelism.
It is more about how much memory you have for each CPU core than how much memory you have in total.
See linux build uses --parallel
and it heavy machines where we never see issue building ONNX Runtime.
Sorry my response was eaten by a part because of formatting. I meant, put a number there after "--parallel", to limit the number of concurrent processes. Let's say you have 64GB memory and 16 CPUs. By default make/msbuild will create at most 16 subprocesses. Since we do not know if 4GB is enough for one compiler process, sometimes we might need to manually adjust the parallelism to avoid OOM.
Sounds like a suggestion to have 8Gb per process, am I right?
Sorry my response was eaten by a part because of formatting. I meant, put a number there after "--parallel", to limit the number of concurrent processes. Let's say you have 64GB memory and 16 CPUs. By default make/msbuild will create at most 8 subprocesses. Since we do not know if 4GB is enough for one compiler process, sometimes we might need to manually adjust the parallelism to avoid OOM.
See in my scenario we don't set limit to parallel jobs and using default which "1" by default: https://github.com/microsoft/onnxruntime/blob/main/tools/ci_build/build.py#L171
What will be the reason to set limit to 2 or 4 if we failing with OOO using single process?
Actually the default is not one. If the optional value is 0 or unspecified, it is interpreted as the number of CPUs. As you know how much CPUs the machine has, you may start with dividing it by half. For example, if we think the default value is 16, we try 8 first. If the error still exists, we decrease it further. Eventually it will pass because 64GB is definitely enough for one single compiler processs.
You may also need to tune the "--nvcc_threads" parameter. To be safe, you can set it to one.
My windows build environment has 2 CPUs.
Describe the issue
Getting issue trying to compile against
rel-1.20.0
branch. We are getting out of memory issue, for both Linux and Windows platforms.windows config (64GB RAM):
LInux (64GB RAM):
Urgency
ASAP
Target platform
Linux, Windows
Build script
Windows:
linux:
Error / output
No error, container fails out of memory.
Visual Studio Version
No response
GCC / Compiler Version
No response