Open mseitzer opened 7 months ago
Is it actually needed to have this in a separate directory? For me it would be more convenient to have simply have it in the results_dir together with the other output. But if there is a reason to separate this, I'm all in for making it configurable (then I can simply set to the same path as results_dir).
With the multi-level configuration I see a bit of a risk that things get less reproducible if there is a global config which may differ between users. But if we want to do it, it should be pretty simple to implement with variconf, an omegaconf-wrapper I wrote some time ago for exactly that purpose.
For me, there are different categories of temporary files:
-project
): nearly always deleted after run (besides debugging). I don't want to see it in my results dir.-jobs
): nearly always deleted after run (besides debugging). I don't want to see it in my results dir.-jobs
): could be useful to keep, depends on user preference. Could be stored in results dir.Thus, I think it's still good to have a temporary folder(s) per run, and that the location(s) should be configurable.
Additionally, there should be the option of putting the cluster logs inside the results dir (either in
Probably my preferences are a bit different as I only ever run cluster_utils for testing/debugging :D
Additionally, there should be the option of putting the cluster logs inside the results dir (either in
/.cluster_logs, or in /working_directories//.cluster_logs`). This is what #21 tracks.
Sounds good. Actually with omegaconf, this should be relatively easy as one can use other values as variables. So might not even need to be an explicit feature, once we support omegaconf.
Actually, I think the nicest, most standard solution for this would be to adhere to the XDG_CACHE_HOME
environment variable for the cache directory:
cache_dir = pathlib.Path(os.environ.get("XDG_CACHE_HOME", "~/.cache")).expanduser()
Independent of this, I would suggest to add an option to store stdout/err output of jobs in working_dir, but this is tracked in #21.
Let the user control the directory where cluster utils stores job logs and git projects. Important for systems where the home is space restricted.
In general, we should probably have a better configuration story. The usual way it's done seems to something like:
where the lower levels overwrite the higher ones.
Related: #21