stan-dev / cmdstanr

CmdStanR: the R interface to CmdStan
https://mc-stan.org/cmdstanr/
Other
143 stars 63 forks source link

Building a model needs write access to cmdstan installation #995

Open weshinsley opened 4 months ago

weshinsley commented 4 months ago

Describe the bug Not sure if this is an issue with cmdstanr, or cmdstan - and is something of a design problem, rather than a bug per se.

When we run

mod <- cmdstanr::cmdstan_model('code.stan', dir = path)

These two files (and perhaps others) get modified within the cmdstan installation :- cmdstan-2.35.0\src\cmdstan\main.o cmdstan-2.35.0\stan\lib\stan_math\make\ucrt

Therefore, this can only work if users have write-access to the cmdstan installation - something we need to avoid. See below.

To Reproduce Run any stan model - we used the usual one in the start-up documentation, and put it in code.stan, and then...

path <- tempfile()
dir.create(path)
mod <- cmdstanr::cmdstan_model('code.stan', dir = path)

The above two files get modified, and their date will be changed.

Expected behavior We should not have to grant users write-access to the cmdstan installation for them to run a model.

Operating system Windows 10

CmdStanR version number 8.0.1

Additional context We are running Stan in an HPC cluster context, and the issue cause us two problems:-

  1. It requires all users to have write-access to the installation of cmdstan, otherwise they get an access denied error writing the ucrt file and their job fails.

  2. Further, there are possibilities that multiple users, or even the same user might run multiple Stan jobs on the same node at the same time, which will conflict with each other if a shared cmdstan install folder is used for read/writing as part of the process.

We definitely do not want to install a new separate private cmdstan installation for every cluster job, and it's surprising to us that running a model makes actual file changes within the cmdstan installation folders. Is there a way of ensuring this happens elsewhere? We had hoped providing a dir argument would allow this.

andrjohns commented 4 months ago

These files are not created/modified every time that a cmdstan model is executed - only the first time. If you install cmdstan and compile a model, these files will be created and will not be modified in subsequent calls

weshinsley commented 4 months ago

We are observing something different to that - here is the relevant directory:-

 Directory of I:\cmdstan\cmdstan-2.35.0\stan\lib\stan_math\make

11/06/2024  14:03    <DIR>          .
11/06/2024  14:03    <DIR>          ..
03/06/2024  15:43             2,061 clang-tidy
03/06/2024  15:43            15,664 compiler_flags
03/06/2024  15:43               818 cpplint
03/06/2024  15:43               305 dependencies
03/06/2024  15:43            11,248 libraries
03/06/2024  15:43             1,137 standalone
03/06/2024  15:43             7,120 tests
12/06/2024  18:33                16 ucrt
               8 File(s)         38,369 bytes

And after running the same model again:-

 Directory of I:\cmdstan\cmdstan-2.35.0\stan\lib\stan_math\make

11/06/2024  14:03    <DIR>          .
11/06/2024  14:03    <DIR>          ..
03/06/2024  15:43             2,061 clang-tidy
03/06/2024  15:43            15,664 compiler_flags
03/06/2024  15:43               818 cpplint
03/06/2024  15:43               305 dependencies
03/06/2024  15:43            11,248 libraries
03/06/2024  15:43             1,137 standalone
03/06/2024  15:43             7,120 tests
12/06/2024  22:13                16 ucrt
               8 File(s)         38,369 bytes

and if I remove my permissions so I don't have write-access to this directory, then I get this error early in the job.

/bin/sh: line 1: stan/lib/stan_math/make/ucrt: Permission denied
andrjohns commented 4 months ago

@WardBrian it looks like the UCRT flag cache step is actually still running the detection and write step even when the file already exists

@weshinsley to workaround this for now, you can delete/comment L194-204 in stan/lib/stan_math/make/compiler_flags:

  make/ucrt:
    pound := \#
    UCRT_STRING := $(shell echo '$(pound)include <windows.h>' | $(CXX) -E -dM -  | $(STR_SEARCH) _UCRT)
    ifneq (,$(UCRT_STRING))
      IS_UCRT ?= true
    else
      IS_UCRT ?= false
    endif
    $(shell echo "IS_UCRT ?= $(IS_UCRT)" > $(MATH)make/ucrt)

  include make/ucrt

And add the following to your make/local file (you're using rtools44, so we know it's UCRT):

IS_UCRT=true
weshinsley commented 4 months ago

Thanks - that seems to solve the problem for cmdstan-2.35.0\stan\lib\stan_math\make\ucrt - but cmdstan-2.35.0\src\cmdstan\main.o is still getting rewritten every time. If I don't have write-access there, I get...

Assembler messages:
Fatal error: can't create src/cmdstan/main.o: Permission denied

make: *** [make/program:14: src/cmdstan/main.o] Error 1
weshinsley commented 4 months ago

(I searched the entire set of files for ones with a date change, and it was only those two examples. I am not sure if anything else gets created/deleted as part of the build process; only these two gave me permission problems)

weshinsley commented 4 months ago

Also just to say - thank you for the rapid responses/suggestions with this. I am on UK time so signing off tonight, but will try any suggestions in the morning and report back.

WardBrian commented 4 months ago

There are two issues here:

The first will continue to give you headaches even if we fix all the examples of the second. For example, the first time someone wants to build a model with multithreading support, cmdstan will attempt to create a src/cmdstan/main_threads.o. With our current build system, there is really no good way around it. In a shared environment, the best you could currently do is probably have a copy of the Stan math library somewhere that all users point their cmdstan/stan installation at, and have each user have their own cmdstan in a writable location

weshinsley commented 4 months ago

Wouldn't each user potentially have to have a separate cmdstan install per cluster job that might concurrently run? For users running a lot of jobs, that's potentially quite a burden.

If writing those .o files in some other working directory is not possible, then perhaps the way for now (hinted at in the other issue I think) is to have our shared read-only cmdstan and make a complete copy of it in a temp directory at the start of every stan cluster job, then delete that after the job has finished.

WardBrian commented 4 months ago

That might be necessary if different models are needed at different points during the cluster run.

In my experience (with cmdstanpy, not cmdstanr, but I imagine the same is possible) I have usually compiled my model once, on the head node, and then just copied the executable to each worker node and instantiated my cmdstanmodel object with just that exe, not the path to the .stan file. This will not invoke the cmdstan build system at all during the job, unlike passing the path to the stan file, which does invoke to to make sure everything is “up to date”

andrjohns commented 4 months ago

Given that cmdstan_main.o is just being repeatedly linked against when compiling a model, could we instead build cmdstan_main_* as a shared/static library when build is called? That way the model compilation shouldn't modify the file

WardBrian commented 4 months ago

There are technical combinatorially many main.o options - THREADS, MPI, OPENCL, and NORANGE, plus all possible combinations of them…

weshinsley commented 4 months ago

In my experience (with cmdstanpy, not cmdstanr, but I imagine the same is possible) I have usually compiled my model once, on the head node, and then just copied the executable to each worker node and instantiated my cmdstanmodel object with just that exe, not the path to the .stan file.

This helps once you've got the exe, but to build it, you have to have write-access - hence your own private cmdstan copy on the head-node, so that it can change its own main.o when it builds your model, and so that no-one else uses the same cmdstan installation at the same time for building their model.

While this could be configured, it's very unusual to open up write-access to tools in that way, and with a lot of users, at 2.5Gb per current cmdstan installation, it feels likely to cause a lot of messy storage use and not be scalable.

Ultimately, just being able to set a working directory for the build seems to me the ideal angle to try and improve this. Clearly just in our conversation here, the current approach of stan(cmd) is difficult to work with, and HPC cluster use of Stan is only going to increase.

WardBrian commented 4 months ago

Ultimately, just being able to set a working directory for the build seems to me the ideal angle to try and improve this.

Yes, this would be ideal. Unfortunately re-writing the build system is difficult, but some work is ongoing.

I think one should think of the source files in the CmdStan installation as part of the user's code, for now. I believe the vast majority of the file size is from the math library and it's dependencies, so having a built version of that somewhere which other installations can point to should cut down on the directory issues

weshinsley commented 4 months ago

OK - are there any docs/pointers for how to build the stan math library on its own, and then get that wired into cmdstan for when it builds your model?

It sounds like a way forward if that can reduce the size of the cmdstan installation we'd need to replicate for all jobs to something much more minimal. It would also have the advantage of speeding up model-building, which is taking a few minutes in our tests so far.

WardBrian commented 4 months ago

The minimal set up is:

The same could be done with stan-dev/stan and the STAN setting if all your users set PRECOMPILED_HEADERS=false, otherwise writes to the stan subfolder are necessary