Closed mseitzer closed 3 weeks ago
Hm, good questions. Activating an a venv and then running singularity indeed doesn't make much sense, while having a venv inside the container has valid use cases. Currently my suggestion is to do that via a runscript in the container (i.e. needs to be defined at build-time of the container), but a more flexible solution for this might be nice.
Regarding the pre_job_script
: It depends on what this script is used for. It could, for example, be used to copy a dataset from ${WORK} to the local ${SCRATCH} of the node for faster access. In that case it would be better to do it outside the container. There might be other use cases where it would better be inside the container.
Maybe as a solution that somewhat covers both: Keep the current options as they are but add an option env_setup_script
(name open to discussion) in the singularity
section. If set, this script is sourced (not executed) within the container before running the main script.
This could then be used in a flexible way to activate any kind of virtual environment and do other stuff as needed.
Regarding variables
: By default Singularity actually exports the environment, so unless it is run with --cleanenv
or --containall
, it should just work. If --cleanenv
/--containall
is used, one would need to prepend all names with SINGULARITYENV_
, then they will still be available inside the container. Maybe the latter could be done automatically when using Singularity, then the user doesn't have to worry about it.
Keep the current options as they are but add an option env_setup_script (name open to discussion) in the singularity section.
Yep, I also thought that having both options would be best.
Regarding variables
If we automate it, variables can also be passed with the --env <key>=<val>
syntax. Which seems a little cleaner than the SINGULARITY_ENV_
prefix.
I am wondering if it actually could make sense to always run with --cleanenv
, in order to improve isolation by default, and prevent subtle bugs.
If we automate it, variables can also be passed with the
--env <key>=<val>
syntax. Which seems a little cleaner than theSINGULARITY_ENV_
prefix.
:+1:
I am wondering if it actually could make sense to always run with
--cleanenv
, in order to improve isolation by default, and prevent subtle bugs.
I agree, setting it is probably the better default behaviour. I originally decided against it, as one might need access to some environment variables but it's probably better to explicitly list them in variables
in that case.
The discussion here is now covering multiple actual tasks. To keep overview, I created separate issues for them (see reference links above) and will close this one.
The current implementation of using singularity containers performs the environment setup outside of the container. I think in most cases, the environment would be installed within the container, and so the environment setup (e.g.
source venv/bin/activate
) needs to happen inside the container, before the main script is executed.I don't know how to best implement this. Maybe this also should to be left to the user, because there are many different things people could do? Although I think it would be great if cluster_utils offers a standard solution for that problem.
The same problem occurs for
pre_job_script
andvariables
. These should probably also act within the container.