Open weshinsley opened 4 months ago
These files are not created/modified every time that a cmdstan model is executed - only the first time. If you install cmdstan and compile a model, these files will be created and will not be modified in subsequent calls
We are observing something different to that - here is the relevant directory:-
Directory of I:\cmdstan\cmdstan-2.35.0\stan\lib\stan_math\make
11/06/2024 14:03 <DIR> .
11/06/2024 14:03 <DIR> ..
03/06/2024 15:43 2,061 clang-tidy
03/06/2024 15:43 15,664 compiler_flags
03/06/2024 15:43 818 cpplint
03/06/2024 15:43 305 dependencies
03/06/2024 15:43 11,248 libraries
03/06/2024 15:43 1,137 standalone
03/06/2024 15:43 7,120 tests
12/06/2024 18:33 16 ucrt
8 File(s) 38,369 bytes
And after running the same model again:-
Directory of I:\cmdstan\cmdstan-2.35.0\stan\lib\stan_math\make
11/06/2024 14:03 <DIR> .
11/06/2024 14:03 <DIR> ..
03/06/2024 15:43 2,061 clang-tidy
03/06/2024 15:43 15,664 compiler_flags
03/06/2024 15:43 818 cpplint
03/06/2024 15:43 305 dependencies
03/06/2024 15:43 11,248 libraries
03/06/2024 15:43 1,137 standalone
03/06/2024 15:43 7,120 tests
12/06/2024 22:13 16 ucrt
8 File(s) 38,369 bytes
and if I remove my permissions so I don't have write-access to this directory, then I get this error early in the job.
/bin/sh: line 1: stan/lib/stan_math/make/ucrt: Permission denied
@WardBrian it looks like the UCRT flag cache step is actually still running the detection and write step even when the file already exists
@weshinsley to workaround this for now, you can delete/comment L194-204 in stan/lib/stan_math/make/compiler_flags
:
make/ucrt:
pound := \#
UCRT_STRING := $(shell echo '$(pound)include <windows.h>' | $(CXX) -E -dM - | $(STR_SEARCH) _UCRT)
ifneq (,$(UCRT_STRING))
IS_UCRT ?= true
else
IS_UCRT ?= false
endif
$(shell echo "IS_UCRT ?= $(IS_UCRT)" > $(MATH)make/ucrt)
include make/ucrt
And add the following to your make/local
file (you're using rtools44, so we know it's UCRT):
IS_UCRT=true
Thanks - that seems to solve the problem for cmdstan-2.35.0\stan\lib\stan_math\make\ucrt
- but cmdstan-2.35.0\src\cmdstan\main.o
is still getting rewritten every time. If I don't have write-access there, I get...
Assembler messages:
Fatal error: can't create src/cmdstan/main.o: Permission denied
make: *** [make/program:14: src/cmdstan/main.o] Error 1
(I searched the entire set of files for ones with a date change, and it was only those two examples. I am not sure if anything else gets created/deleted as part of the build process; only these two gave me permission problems)
Also just to say - thank you for the rapid responses/suggestions with this. I am on UK time so signing off tonight, but will try any suggestions in the morning and report back.
There are two issues here:
The first will continue to give you headaches even if we fix all the examples of the second. For example, the first time someone wants to build a model with multithreading support, cmdstan will attempt to create a src/cmdstan/main_threads.o
. With our current build system, there is really no good way around it. In a shared environment, the best you could currently do is probably have a copy of the Stan math library somewhere that all users point their cmdstan/stan installation at, and have each user have their own cmdstan in a writable location
Wouldn't each user potentially have to have a separate cmdstan install per cluster job that might concurrently run? For users running a lot of jobs, that's potentially quite a burden.
If writing those .o
files in some other working directory is not possible, then perhaps the way for now (hinted at in the other issue I think) is to have our shared read-only cmdstan and make a complete copy of it in a temp directory at the start of every stan cluster job, then delete that after the job has finished.
That might be necessary if different models are needed at different points during the cluster run.
In my experience (with cmdstanpy, not cmdstanr, but I imagine the same is possible) I have usually compiled my model once, on the head node, and then just copied the executable to each worker node and instantiated my cmdstanmodel object with just that exe, not the path to the .stan file. This will not invoke the cmdstan build system at all during the job, unlike passing the path to the stan file, which does invoke to to make sure everything is “up to date”
Given that cmdstan_main.o
is just being repeatedly linked against when compiling a model, could we instead build cmdstan_main_*
as a shared/static library when build
is called? That way the model compilation shouldn't modify the file
There are technical combinatorially many main.o options - THREADS, MPI, OPENCL, and NORANGE, plus all possible combinations of them…
In my experience (with cmdstanpy, not cmdstanr, but I imagine the same is possible) I have usually compiled my model once, on the head node, and then just copied the executable to each worker node and instantiated my cmdstanmodel object with just that exe, not the path to the .stan file.
This helps once you've got the exe
, but to build it, you have to have write-access - hence your own private cmdstan copy on the head-node, so that it can change its own main.o
when it builds your model, and so that no-one else uses the same cmdstan installation at the same time for building their model.
While this could be configured, it's very unusual to open up write-access to tools in that way, and with a lot of users, at 2.5Gb per current cmdstan installation, it feels likely to cause a lot of messy storage use and not be scalable.
Ultimately, just being able to set a working directory for the build seems to me the ideal angle to try and improve this. Clearly just in our conversation here, the current approach of stan(cmd) is difficult to work with, and HPC cluster use of Stan is only going to increase.
Ultimately, just being able to set a working directory for the build seems to me the ideal angle to try and improve this.
Yes, this would be ideal. Unfortunately re-writing the build system is difficult, but some work is ongoing.
I think one should think of the source files in the CmdStan installation as part of the user's code, for now. I believe the vast majority of the file size is from the math library and it's dependencies, so having a built version of that somewhere which other installations can point to should cut down on the directory issues
OK - are there any docs/pointers for how to build the stan math library on its own, and then get that wired into cmdstan for when it builds your model?
It sounds like a way forward if that can reduce the size of the cmdstan installation we'd need to replicate for all jobs to something much more minimal. It would also have the advantage of speeding up model-building, which is taking a few minutes in our tests so far.
The minimal set up is:
make -f make/standalone math-libs
to build the dependenciesMATH
to point to that location, either on the command line, as an environment variable, or in $cmdstan/make/local
The same could be done with stan-dev/stan and the STAN
setting if all your users set PRECOMPILED_HEADERS=false
, otherwise writes to the stan subfolder are necessary
Describe the bug Not sure if this is an issue with cmdstanr, or cmdstan - and is something of a design problem, rather than a bug per se.
When we run
mod <- cmdstanr::cmdstan_model('code.stan', dir = path)
These two files (and perhaps others) get modified within the cmdstan installation :- cmdstan-2.35.0\src\cmdstan\main.o cmdstan-2.35.0\stan\lib\stan_math\make\ucrt
Therefore, this can only work if users have write-access to the cmdstan installation - something we need to avoid. See below.
To Reproduce Run any stan model - we used the usual one in the start-up documentation, and put it in
code.stan
, and then...The above two files get modified, and their date will be changed.
Expected behavior We should not have to grant users write-access to the cmdstan installation for them to run a model.
Operating system Windows 10
CmdStanR version number 8.0.1
Additional context We are running Stan in an HPC cluster context, and the issue cause us two problems:-
It requires all users to have write-access to the installation of cmdstan, otherwise they get an access denied error writing the ucrt file and their job fails.
Further, there are possibilities that multiple users, or even the same user might run multiple Stan jobs on the same node at the same time, which will conflict with each other if a shared cmdstan install folder is used for read/writing as part of the process.
We definitely do not want to install a new separate private cmdstan installation for every cluster job, and it's surprising to us that running a model makes actual file changes within the cmdstan installation folders. Is there a way of ensuring this happens elsewhere? We had hoped providing a
dir
argument would allow this.