singularityhub / shpc-registry

A remote registry for Singularity Registry HPC 🖊️
https://singularityhub.github.io/shpc-registry/
Mozilla Public License 2.0
13 stars 18 forks source link

updating modules within Bright Cluster with Lmod #227

Open SomePersonSomeWhereInTheWorld opened 1 month ago

SomePersonSomeWhereInTheWorld commented 1 month ago

We have a few Bright Computing (now owned by NVIDIA and renamed Base Command Manager) clusters where in one case Lmod version 8.3 is installed in the head node. We'd like to make Brainiak available as a shpc module for all users. What would the correct command be to replace the following?

And then you can tell lmod about your modules folder:

$ module use ./modules

Also based on the getting started instructions

if you install to a system python, meaning either of these commands:

python setup.py install
pip install .

If we are using several versions of Anaconda Python loadable as a module, should we just pick the latest and install this as root so the shpc command becomes available for everyone that loads that module?

vsoch commented 1 month ago

The module command might depend on your module software, but generally speaking, shpc is going to create modules (lmod or environment modules) that you need to add to the equivalent of your module path. You should only need shpc to generate those original modules and pull the container, and then the subsequent commands depend on the module software you are using.

SomePersonSomeWhereInTheWorld commented 1 month ago

The module command might depend on your module software, but generally speaking, shpc is going to create modules (lmod or environment modules) that you need to add to the equivalent of your module path. You should only need shpc to generate those original modules and pull the container, and then the subsequent commands depend on the module software you are using.

OK I see how shpc config edit can be used to edit module_base:

Now module use ./modules works. Perhaps a footnote or comment to make sure the path to the modulefiles are set before running this? Just a suggestion.

The following example:

singularity exec brainiak/brainiak_latest.sif "$@"
Error for command "exec": requires at least 2 arg(s), only received 1

Usage:
  singularity [global options...] exec [exec options...] <container> <command>

Run 'singularity --help' for more detailed usage information.

Should probably be updated.

I also get this:

$ singularity exec brainiak/brainiak_latest.sif  /mnt/brainiak/tutorials/run_jupyter_docker.sh 
/usr/bin/python3: No module named notebook
$ singularity run brainiak/brainiak_latest.sif  /mnt/brainiak/tutorials/run_jupyter_docker.sh 
/usr/bin/python3: No module named notebook

Are these issues related? Is the Docker container broken? https://github.com/brainiak/brainiak/issues/517 https://github.com/brainiak/brainiak/pull/539

vsoch commented 1 month ago

You'd want to test the specific commands that are associated with those containers, and open a PR if one isn't working. The registry is community maintained (and updated automatically) so it's possible (and easy) to see that a particular entrypoint would not work.

I'm not sure you are showing me all the commands you are running, but doing Singularity exec to a module owned image isn't exactly the use case for shpc, you should be using the wrapper scripts or aliases that shpc provides in the module file.

SomePersonSomeWhereInTheWorld commented 1 month ago

You'd want to test the specific commands that are associated with those containers, and open a PR if one isn't working. The registry is community maintained (and updated automatically) so it's possible (and easy) to see that a particular entrypoint would not work.

I'm not sure you are showing me all the commands you are running, but doing Singularity exec to a module owned image isn't exactly the use case for shpc, you should be using the wrapper scripts or aliases that shpc provides in the module file.

Yes I see these:

       - brainiak-srm-run:
             singularity run -B <wrapperDir>/99-shpc.sh:/.singularity.d/env/99-shpc.sh <container> "$@"
       - brainiak-srm-shell:
             singularity shell -s /bin/sh -B <wrapperDir>/99-shpc.sh:/.singularity.d/env/99-shpc.sh <container>
       - brainiak-srm-exec:
             singularity exec -B <wrapperDir>/99-shpc.sh:/.singularity.d/env/99-shpc.sh <container> "$@"

My understanding is the left side of the colon for the -B option is what's on the actual server and the right side is what exists in the container.

Where is the '99-shpc.sh' coming from? I see it's in the container but is the idea to copy the file from the container to your local directory?

vsoch commented 1 month ago

-B is a bind request (the other form of that flag is --bind). and those commands show binding the environment file from the wrapper directory to the shpc environment. That isn't the actual command, because the actual command has a full path to that. If you do an exec you do need to provide a command. if you use shell you should shell in, and run will hit the container entrypoint.

Try looking at the script directory in the wrapper directory (that is an shpc setting) to see what is actually being run. Then you can debug directly with the container. shpc can't take ownership of the scope of issues that can happen with containers, but we definitely accept fixes to any of the container.yaml files that generate the modules.