singularityhub / singularity-hpc

Local filesystem registry for containers (intended for HPC) using Lmod or Environment Modules. Works for users and admins.
https://singularity-hpc.readthedocs.io
Mozilla Public License 2.0
111 stars 25 forks source link

Interaction between separate modules #544

Open JossWhittle opened 2 years ago

JossWhittle commented 2 years ago

Consider the container.yaml snippets that define two modules. Relion has both command line utilities and a GUI (which has been aliased), Motioncor2 aliases a command line program.

docker: quay.io/rosalindfranklininstitute/relion
features:
  gpu: true
  x11: true
aliases:
  relion: /usr/local/relion/bin/relion
  ...
  relion_refine_mpi: /usr/local/relion/bin/relion_refine_mpi
docker: quay.io/rosalindfranklininstitute/motioncor2
features:
  gpu: true
aliases:
  motioncor2: /usr/local/motioncor2/bin/motioncor2

Relion is usually used by domain scientists through the GUI. Internally the GUI will execute one or more of Relions command line utilities, each of which may then access the GPU.

Motioncor2 can be run standalone on the CLI, but more generally will be called from the Relion GUI using a path to the executable. However, the latter (running motioncor2 via the GUI) is equivalent to running it through the relion-run wrapper which I will use in examples here for clarity:

module load motioncor2 relion

relion                # works and shows GUI!
relion_refine_mpi     # works on CLI!
motioncor2            # works on CLI!

relion-run motioncor2 
# does not work, can't find motioncor2 on PATH within relion container

# enabling wrapper scripts in settings.yaml rather than the default of using 
# shell functions gets us a step closer since we can see the path from the host
relion-run /path/to/modules/motioncor2/bin/motioncor2 
# but still does not work, the wrapper script is found on the host system correctly, 
# but the relion container does not have singularity installed so it cannot 
# execute the motioncor2 container

Is there a good solution to this problem already in place, where module containers need to call each others exposed aliases?

Is installing singularity in all the module containers to allow them to nest with each other a valid solution?

Is it okay to mount the singularity install from the host into all the containers using bind parameters?

Vis-à-vis nesting singularity container instances I found this related issue: https://github.com/apptainer/singularity/issues/5759#issuecomment-919523970 Is this workaround still required? It could potentially be injected into wrapper scripts using a snippet.

vsoch commented 2 years ago

hey @JossWhittle ! We have been thinking about a variant of this - which might be the same thing, but instead of a focus on one module finding the other alias, it's more generally about grouping: https://github.com/singularityhub/singularity-hpc/issues/527.

Is there a good solution to this problem already in place, where module containers need to call each others exposed aliases?

There is currently not anything in place, but I'm hoping we will find a good solution to this issue of loading a group.

Is installing singularity in all the module containers to allow them to nest with each other a valid solution?

I think generally singularity is installed on the host. I don't see how you'd successfully use it inside another container.

Is it okay to mount the singularity install from the host into all the containers using bind parameters?

You could try but I suspect a lot of functionality won't work.

Vis-à-vis nesting singularity container instances I found this related issue: https://github.com/apptainer/singularity/issues/5759#issuecomment-919523970

That's really interesting - I never would have guessed that could work!

Is this workaround still required? It could potentially be injected into wrapper scripts using a snippet.

You could try that!

I would follow the linked issue thread (or add a comment there) to mention about a group being loaded together than can access one another's aliases I don't think we've chat about that specifically before.

muffato commented 2 years ago

I can see that both containers are under your institute's account, so it looks like you built those two images. As a workaround, @JossWhittle , would you be able to build an image that ships both relion and relion, and call that "relion" ?

JossWhittle commented 2 years ago

@muffato as a last resort yes but I'd rather not. Both relion and motioncor2 (and other similar programs) have multiple versions that have large effects on the way they process the data and our scientists need to be able to mix and match versions of each to replicate and validate results based on what they were original processed with. They also sometimes can have conflicting build dependency versions which makes it difficult to have all versions installed side by side natively.

To avoid needing to build containers for all combinations, the ability to have them loaded side by side containerized is preferable.

@vsoch #527 looks interesting I'll keep an eye on that thread. Calling a container from within a container definitely feels like bad things could happen. I'll have to keep experimenting and see.