Environment setup should be executed within singularity containers?

mseitzer commented 7 months ago

The current implementation of using singularity containers performs the environment setup outside of the container. I think in most cases, the environment would be installed within the container, and so the environment setup (e.g. source venv/bin/activate) needs to happen inside the container, before the main script is executed.

I don't know how to best implement this. Maybe this also should to be left to the user, because there are many different things people could do? Although I think it would be great if cluster_utils offers a standard solution for that problem.

The same problem occurs for pre_job_script and variables. These should probably also act within the container.

luator commented 7 months ago

Hm, good questions. Activating an a venv and then running singularity indeed doesn't make much sense, while having a venv inside the container has valid use cases. Currently my suggestion is to do that via a runscript in the container (i.e. needs to be defined at build-time of the container), but a more flexible solution for this might be nice.

Regarding the pre_job_script: It depends on what this script is used for. It could, for example, be used to copy a dataset from ${WORK} to the local ${SCRATCH} of the node for faster access. In that case it would be better to do it outside the container. There might be other use cases where it would better be inside the container.

Maybe as a solution that somewhat covers both: Keep the current options as they are but add an option env_setup_script (name open to discussion) in the singularity section. If set, this script is sourced (not executed) within the container before running the main script. This could then be used in a flexible way to activate any kind of virtual environment and do other stuff as needed.

Regarding variables: By default Singularity actually exports the environment, so unless it is run with --cleanenv or --containall, it should just work. If --cleanenv/--containall is used, one would need to prepend all names with SINGULARITYENV_, then they will still be available inside the container. Maybe the latter could be done automatically when using Singularity, then the user doesn't have to worry about it.

mseitzer commented 7 months ago

Keep the current options as they are but add an option env_setup_script (name open to discussion) in the singularity section.

Yep, I also thought that having both options would be best.

Regarding variables

If we automate it, variables can also be passed with the --env <key>=<val> syntax. Which seems a little cleaner than the SINGULARITY_ENV_ prefix.

I am wondering if it actually could make sense to always run with --cleanenv, in order to improve isolation by default, and prevent subtle bugs.

luator commented 7 months ago

If we automate it, variables can also be passed with the --env <key>=<val> syntax. Which seems a little cleaner than the SINGULARITY_ENV_ prefix.

:+1:

I am wondering if it actually could make sense to always run with --cleanenv, in order to improve isolation by default, and prevent subtle bugs.

I agree, setting it is probably the better default behaviour. I originally decided against it, as one might need access to some environment variables but it's probably better to explicitly list them in variables in that case.

luator commented 3 weeks ago

The discussion here is now covering multiple actual tasks. To keep overview, I created separate issues for them (see reference links above) and will close this one.

martius-lab / cluster_utils

Environment setup should be executed within singularity containers? #68