Open luator opened 10 months ago
I implemented this option. This is supposed to be an option for advanced users who know what they are doing (there is even a big warning printed out), and the intended way to use cluster_utils for production is to use a git repository. As such, I believe the option should do exactly what it is saying, i.e. run in the working directory, without copying anything to the cache. Making the option "safer" would normalize using it, which would discourage people from using git commits.
Copying the working dir could also have unintended consequences if users store larger amounts of data in their folder, e.g. the virtual environment, data or output checkpoints. I think currently, the project directory is not even removed by cluster utils, see #11.
By Maximilian Seitzer on 2023-11-28T16:32:00 (imported from GitLab)
The feature request comes from a discussion with @jfrey (pinging you, in case you want to defend it :) ). I agree with Max, though, that for proper experiments one should use git. So I also would rather keep the behaviour as is.
By Felix Widmaier on 2023-11-28T16:32:00 (imported from GitLab)
I have nothing against adding a similar option that copies the directory (though again, only for advanced users who know what they are doing).
By Maximilian Seitzer on 2023-11-28T16:37:56 (imported from GitLab)
Normally, cluster_utils clones the code from a git repository to a cache directory and uses that to execute the jobs. When setting
run_in_working_dir=true
in the configuration, this is not happening, instead jobs are executed in the current working directory. This has a potential risk of messing things up if the code in the directory is modified while cluster_utils is still running (i.e. different jobs may use different versions of the code).To avoid that, we could copy the current working directory to the cache and run from there.
pro:
con: