stan-dev / cmdstanpy

CmdStanPy is a lightweight interface to Stan for Python users which provides the necessary objects and functions to compile a Stan program and fit the model to data using CmdStan.
BSD 3-Clause "New" or "Revised" License
149 stars 67 forks source link

add option to change default writing directory #747

Closed Garren-H closed 2 months ago

Garren-H commented 2 months ago

Summary:

In response to my question raised here, @WardBrian suggested I raise this issue. Currently cmdstanpy only has the option to write to /tmp, however this may be undesirable in some instances the directory may have limited space allocation. Adding a feature to change the default directory would hence be a good idea

Possible Solution:

After delving into the source a bit I noticed that the _TMPDIR is imported into several different script. Since all these gets imported upon invoking import cmdstanpy there is no way (that I see) of writing a new method for cmdstanpy (given the current structure) which would allow for a user to change _TMPDIR after importing cmdstanpy. My suggestion to thus is to change the __init__.py script slightly, which would be the most realistic option? That is we let the user define a new environmental variable STAN_TMPDIR which should be the absolute path to an existing directory. We can then change the code as follows:

...
import tempfile
import os # used to check if STAN_TMPDIR is set and is an existing path

# Check if 'STAN_TMPDIR' exists as an environment variable
if 'STAN_TMPDIR' in os.environ:
    # Check if it's an absolute path that exists
    if os.path.isabs(os.environ['STAN_TMPDIR']) and os.path.exists(os.environ['STAN_TMPDIR']):
        dir_ = os.environ['STAN_TMPDIR'] # Use specified directory as /tmp
    else:
        dir_ = None # Default to /tmp 
else:
    dir_ = None # Default to /tmp

_TMPDIR = tempfile.mkdtemp(dir=dir_)
...

This is a slight modification to the code I proposed on the stan forums, after realizing that dir is a python methods and hence changed the variable to dir_

WardBrian commented 2 months ago

I did a bit more reading into the Python documentation and found this:

The default directory is chosen from a platform-dependent list, but the user of the application can control the directory location by setting the TMPDIR, TEMP or TMP environment variables.

Are those usable for your usecase?

Garren-H commented 2 months ago

It would be a use case when working on a personal device, but on a cluster it is not feasible. The cluster I am working with uses PBS and this already sets the TMPDIR to /tmp/<Job_ID>.<server>. Not sure if TMPDIR is used by PBS or not, but would rather not (potentially) mess up the system by testing this out.

WardBrian commented 2 months ago

It is a bit odd that the cluster you are using would set it to a place that they don't want you using, but at any rate I believe you could overwrite this inside your script (therefore leaving the rest of the system unaffected) as long as it was before the cmdstanpy import

import os
os.environ['TMPDIR'] = 'your/desired/place'
import cmdstanpy
# proceed. You could even at this point reset TMPDIR to whatever it was before, I believe
Garren-H commented 2 months ago

I am using a university cluster, and according to the IT department the /tmp folder only has a few gb which is used for cache (might be wrong as I read this sometime last year, but it was something along these lines).

I did not think of this possibility, this should definitely work. I will try it and get back to you

Garren-H commented 2 months ago

Thanks again for the assistance Brian! This does give the desired results

WardBrian commented 2 months ago

Glad that worked! And thanks for raising the issue to begin with, I doubt we had ever given any thought to the default location of the temp directory.

We should probably at least mention those environment variables in our documentation somewhere.