vatlab / sos

SoS workflow system for daily data analysis
http://vatlab.github.io/sos-docs
BSD 3-Clause "New" or "Revised" License
274 stars 45 forks source link

Singularity exec vs run #1517

Closed gaow closed 1 year ago

gaow commented 1 year ago

By default SoS triggers singularity via singularity exec. However, in our recent applications we need singularity run -- see here the difference just for the record. singularity exec ignores the entrypoint and as we migrate our package management using conda, this will breaks most conda-based Singularity images we have using exec instead of run. Should we switch to singularity run as a safer default and somehow still support exec?

hsun3163 commented 1 year ago

A mwe can be find by

singularity pull --arch amd64 library://hs3163/collection/mwe.sif:sha256.44323a8aff68faff636de03c724bc62741e8268405f0a6c70984927922613068

What we can do now is:

singularity run --env ENV_NAME=hello mwe.sif R

R version 4.2.3 (2023-03-15) -- "Shortstop Beagle"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: x86_64-conda-linux-gnu (64-bit)

But we cannot do that via exec, which is the SOS way.

singularity exec --env ENV_NAME=hello mwe.sif R

What we hope could achieve is that, by simply specifying the container and a additional parameter for the ENV_NAME.

we can achieve singularity run --env ENV_NAME=hello mwe.sif R

danielnachun commented 1 year ago

To expand on this, there are potentially two issues going on here:

1) It's not possible to use singularity run to invoke containers instead of singularity exec with SoS. For conda based containers (and possibly others) this is major limitation because it bypasses the entry point that is needed for the container to work properly. An identical problem currently affects Nextflow https://github.com/nextflow-io/nextflow/issues/3206. I believe the reason some workflow managers use singularity exec is because some containers have restrictive entry points, but for conda based containers the entry point is just this: https://github.com/mamba-org/micromamba-docker/blob/main/_entrypoint.sh, which will pass all commands through the shell after activating the conda environment. 2) It's unclear if the --env ENV_NAME=hello flag is actually being passed through to singularity properly. This is needed for multi-environment conda based containers to work properly. We can't determine yet if this is actually a problem, because we are unable to test it without the ability to use singularity run.

Please let me know if there are further questions with how best to handle this! These should hopefully be relatively small tweaks to how the containers are run.

bioworkflows commented 1 year ago

How do you plan to use singularity run? According to https://vatlab.github.io/sos-docs/doc/user_guide/singularity.html , the way to use singularity is to ignore default command and use it as a container (singularity exec) so that we can execute arbitrary commands from the container.

So

%run -v1
sh: container='library://alpine:latest'
   cat /etc/os-release

is actually singularity exec sh the-shell-script.

If you really need to change the interpreter to include entrypoint, maybe the syntax can be (not sure if it works)

%run -v1
script: container='library://alpine:latest' , interpreter='/entrypoint Rscript'
  a R script

If you simply need to run a command (not a script), maybe it is easier just to do

sh:
    singularity pull...
    singularity run ...
bioworkflows commented 1 year ago

The consensus after some discussions is to add an optional parameter entrypoint, which will proceed interpreter and provide an entrypoint to execute interpreter. So

R: entrypoint='/entrypoint':
   a script

will execute the script as /entrypoint Rscript, and

python: entrypoint='/entrypoint':
   a script

will execute the script as /entrypoint python.

As a special case,

script: entrypoint='/entrypoint':
   a script

will execute the script as /entrypoint because the script action does not have a default interpreter.

Changes will be made in response to this tickets are

  1. Add intepreter options to actions that allows the customization of interpreter. It will have default parameters for R as Rscript etc but will allow the use of R: interpreter='/path/to/Rscript'.
  2. Add entrypoint option to actions that allows the specification of entrypoint.
  3. For singularity, automatically extract the image entrypoint and allows it to be passed as a variable to option entrypoint.
  4. Update documentation https://vatlab.github.io/sos-docs/doc/user_guide/script_actions.html and https://vatlab.github.io/sos-docs/doc/user_guide/shell_actions.html .by updating relevant pages under https://github.com/vatlab/sos-docs/tree/master/src/user_guide
bioworkflows commented 1 year ago

In docker it is entrypoint, in singularity it is called runscript?

BoPeng commented 1 year ago

I have merged the PR and released sos v0.24.1