Closed sirmarcel closed 4 months ago
Thanks for the find, this is a very good point. I'll address this in a PR tomorrow.
Thanks @nickjbrowning !
One thing I don't understand here is that we don't have that many files to compile, so make -j
and make -j8
should have the same behavior (launch ~8 compilation jobs).
It's a bit suspicious. My observation is: (a) compilation dies with kill
on the default allocation on izar
(4GB I believe), (b) if you remove --parallel
from the setup.py
file of sphericart-torch
, it works without problem, (c) requesting a node with 32GB also works, without modification.
Oh, right. I can see the compiler requiring a couple of GiB per file (there are a lot of torch header to parse and template to instantiate), so parallel compilation would fail with only 4GiB of available RAM. But then the changed by @nickjbrowning would not fix it here, since the compilation would also fail with only 8 jobs.
I've added these two environment variables to the build process:
SPHERICART_PARALLEL_BUILD=ON
SPHERICART_JOBS=NJOBS
So you can now control the number of build jobs via:
SPHERICART_PARALLEL_BUILD=OFF pip install .[torch] #disables parallel builds
SPHERICART_JOBS=4 pip install .[torch] #uses 4 jobs for compilation
Currently, attempting to build the
sphericart-torch
wheel withpip
requires a large amount of RAM if many CPU cores are present. I think this is due to this line, which invokescmake
without specifying the number of jobs, which presumably will default to the total number of cores. On a HPC system those can be 40 or 80, and so compilation tends to getkill
ed by the host OS.While this is not catastrophic, it is inconvenient, and a waste of resources in many cases (the compilation is not much faster in parallel mode). I would suggest defaulting to some reasonable default instead, or disabling parallel builds entirely. Alternatively, the installation docs should at least mention this fact (see #116).