snakemake / snakemake-executor-plugin-slurm

A Snakemake executor plugin for submitting jobs to a SLURM cluster
MIT License
18 stars 18 forks source link

"module command not found" using Snakemake with Slurm #63

Closed kevinrue closed 6 months ago

kevinrue commented 7 months ago

I think I've initially posted in the wrong repo. Please see https://github.com/snakemake/snakemake/issues/2802#issuecomment-2049297585

Especially as --executor none works fine, but my profile using --executor slurm fails to load an environment module.

Given my discussion with IT so far, we suspect that whatever shell is launched on the compute node isn't a login shell, which is a requirement to access the module function (at least on our cluster).

kevinrue commented 7 months ago

Not being a complete expert on many of the topics involved, and following a link to another, i've arrived from subprocess.check_output to https://docs.python.org/3/library/subprocess.html#subprocess.Popen which states:

On POSIX with shell=True, the shell defaults to /bin/sh.

while most of what we do on our cluster relies on /bin/bash in particular that issue of login shells needed to access modules.

Could that /bin/sh thing be the issue?

Happy to be the guinea pig testing things if someone is willing to guide me through the process, as I've been trying to solve that problem for a whole day now and I've been going in circle and losing track of what to focus on.

cmeesters commented 7 months ago

Well,

Could that /bin/sh thing be the issue?

Try,

$ md5sum /bin/sh /bin/bash

chances are they are the same. Even if not, I can think of two different causes here:

Could you use your minimal example, start with snakemake --verbose ... and post the log from both the login node and the job output (or attach it) here, please?

Snakemake will attempt to run the module command in the job context, if the module command is not present there, it is most likely an environment issue. Yet, we actively export the environment (which ought to be a default, anyway). Hence, a second thing to check would be running env on the login node and within the snakemake rule and include the output, here.

Also: Which module system are you using (output of which module)?

kevinrue commented 7 months ago

Thanks for the quick reply!

Interesting, they're different 👀

$ md5sum /bin/sh /bin/bash
7409ae3f7b10e059ee70d9079c94b097  /bin/sh
d7bc3ce3b6b7ac53ba2918a97d806418  /bin/bash

With --verbose

$ snakemake --verbose --use-envmodules
Using profile default for setting default command line arguments.
Building DAG of jobs...
shared_storage_local_copies: True
remote_exec: False
SLURM run ID: 6ec84bc7-c457-42a8-9655-1561f880896c
Using shell: /usr/bin/bash
Provided remote nodes: 100
Job stats:
job                   count
------------------  -------
cellranger_version        1
total                     1

Resources before job selection: {'_cores': 9223372036854775807, '_nodes': 100}
Ready jobs (1)
Select jobs to execute...
Using greedy selector because only single job has to be scheduled.
Selected jobs (1)
Resources after job selection: {'_cores': 9223372036854775806, '_nodes': 99}
Execute 1 jobs...

[Thu Apr 11 13:46:36 2024]
rule cellranger_version:
    jobid: 0
    reason: Rules with neither input nor output files are always executed.
    resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, slurm_partition=short, runtime=10

cellranger --version
No SLURM account given, trying to guess.
Unable to guess SLURM account. Trying to proceed without.
General args: ['--force', '--target-files-omit-workdir-adjustment', '--keep-storage-local-copies', '--max-inventory-time 0', '--nocolor', '--notemp', '--no-hooks', '--nolock', '--ignore-incomplete', '', '--verbose ', '--rerun-triggers software-env code params input mtime', '', '', '--deployment-method conda env-modules', '--conda-frontend mamba', '', '--conda-base-path /ceph/project/sims-lab/albrecht/miniforge3', '', '', '', '--shared-fs-usage sources input-output storage-local-copies persistence source-cache software-deployment', '', '--wrapper-prefix https://github.com/snakemake/snakemake-wrappers/raw/', '', '', '', '', '--printshellcmds ', '--latency-wait 15', '--scheduler ilp', '--local-storage-prefix .snakemake/storage', '--scheduler-solver-path /ceph/project/sims-lab/albrecht/miniforge3/envs/snakemake/bin', '', '', '', '', '', '--default-resources base64//bWVtX21iPW1pbihtYXgoMippbnB1dC5zaXplX21iLCAxMDAwKSwgODAwMCk= base64//ZGlza19tYj1tYXgoMippbnB1dC5zaXplX21iLCAxMDAwKQ== base64//dG1wZGlyPXN5c3RlbV90bXBkaXI= base64//c2x1cm1fcGFydGl0aW9uPXNob3J0 base64//cnVudGltZT0xMA==', '']
sbatch call: sbatch --job-name 6ec84bc7-c457-42a8-9655-1561f880896c --output /ceph/project/tendonhca/albrecht/test-snakemake/.snakemake/slurm_logs/rule_cellranger_version/%j.log --export=ALL --comment cellranger_version -p short -t 10 --mem 1000 --ntasks=1 --cpus-per-task=1 -D /ceph/project/tendonhca/albrecht/test-snakemake --wrap="/ceph/project/sims-lab/albrecht/miniforge3/envs/snakemake/bin/python3.12 -m snakemake --snakefile /ceph/project/tendonhca/albrecht/test-snakemake/Snakefile --target-jobs 'cellranger_version:' --allowed-rules 'cellranger_version' --cores all --attempt 1 --force-use-threads  --resources 'mem_mb=1000' 'mem_mib=954' 'disk_mb=1000' 'disk_mib=954' --wait-for-files '/ceph/project/tendonhca/albrecht/test-snakemake/.snakemake/tmp.4cxxt4o_' --force --target-files-omit-workdir-adjustment --keep-storage-local-copies --max-inventory-time 0 --nocolor --notemp --no-hooks --nolock --ignore-incomplete --verbose  --rerun-triggers software-env code params input mtime --deployment-method conda env-modules --conda-frontend mamba --conda-base-path /ceph/project/sims-lab/albrecht/miniforge3 --shared-fs-usage sources input-output storage-local-copies persistence source-cache software-deployment --wrapper-prefix https://github.com/snakemake/snakemake-wrappers/raw/ --printshellcmds  --latency-wait 15 --scheduler ilp --local-storage-prefix .snakemake/storage --scheduler-solver-path /ceph/project/sims-lab/albrecht/miniforge3/envs/snakemake/bin --default-resources base64//bWVtX21iPW1pbihtYXgoMippbnB1dC5zaXplX21iLCAxMDAwKSwgODAwMCk= base64//ZGlza19tYj1tYXgoMippbnB1dC5zaXplX21iLCAxMDAwKQ== base64//dG1wZGlyPXN5c3RlbV90bXBkaXI= base64//c2x1cm1fcGFydGl0aW9uPXNob3J0 base64//cnVudGltZT0xMA== --executor slurm-jobstep --jobs 1 --mode remote"
Job 0 has been submitted with SLURM jobid 1194995 (log: /ceph/project/tendonhca/albrecht/test-snakemake/.snakemake/slurm_logs/rule_cellranger_version/1194995.log).
The job status was queried with command: sacct -X --parsable2 --noheader --format=JobIdRaw,State --starttime 2024-04-09T13:00 --endtime now --name 6ec84bc7-c457-42a8-9655-1561f880896c
It took: 0.09709572792053223 seconds
The output is:
'1194995|PENDING
'

status_of_jobs after sacct is: {'1194995': 'PENDING'}
active_jobs_ids_with_current_sacct_status are: {'1194995'}
active_jobs_seen_by_sacct are: {'1194995'}
missing_sacct_status are: set()
The job status was queried with command: sacct -X --parsable2 --noheader --format=JobIdRaw,State --starttime 2024-04-09T13:00 --endtime now --name 6ec84bc7-c457-42a8-9655-1561f880896c
It took: 0.09258508682250977 seconds
The output is:
'1194995|PENDING
'

status_of_jobs after sacct is: {'1194995': 'FAILED'}
active_jobs_ids_with_current_sacct_status are: {'1194995'}
active_jobs_seen_by_sacct are: {'1194995'}
missing_sacct_status are: set()
[Thu Apr 11 13:50:34 2024]
Error in rule cellranger_version:
    message: SLURM-job '1194995' failed, SLURM status is: 'FAILED'For further error details see the cluster/cloud log and the log files of the involved rule(s).
    jobid: 0
    log: /ceph/project/tendonhca/albrecht/test-snakemake/.snakemake/slurm_logs/rule_cellranger_version/1194995.log (check log file(s) for error details)
    shell:
        cellranger --version
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    external_jobid: 1194995

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-04-11T134633.623105.snakemake.log
unlocking
removing lock
removing lock
removed all locks
Full Traceback (most recent call last):
  File "/ceph/project/sims-lab/albrecht/miniforge3/envs/snakemake/lib/python3.12/site-packages/snakemake/cli.py", line 2068, in args_to_api
    dag_api.execute_workflow(
  File "/ceph/project/sims-lab/albrecht/miniforge3/envs/snakemake/lib/python3.12/site-packages/snakemake/api.py", line 589, in execute_workflow
    workflow.execute(
  File "/ceph/project/sims-lab/albrecht/miniforge3/envs/snakemake/lib/python3.12/site-packages/snakemake/workflow.py", line 1285, in execute
    raise WorkflowError("At least one job did not complete successfully.")
snakemake_interface_common.exceptions.WorkflowError: At least one job did not complete successfully.

WorkflowError:
At least one job did not complete successfully.

Not sure why I see --executor slurm-jobstep in the log above when my profile says:

#cluster: qsub
executor: slurm
jobscript: "slurm-jobscript.sh"
jobs: 100
latency-wait: 15
default-resources:
  - slurm_partition=short
  - runtime=10
  #- mem_mb=6000
  #- disk_mb=1000000
rerun-incomplete: True
printshellcmds: True
use-conda: True

(and I haven't installed the slurm-jobstep plugin.

cmeesters commented 7 months ago

and I haven't installed the slurm-jobstep plugin.

That should be installed automatically, if you install this plugin via conda/mamba/etc. However, if it is really missing, please be sure to work with it: Snakemake submits itself and to work properly in the job context, you need the plugin. You can check this with

$ mamba list | grep snakemake

or conda in place of mamba.

After installation, please try again and report the output, resp. log of

rule my_cellranger:
    envmodules:
        "cellranger/7.2.0"
   log: "logs/cellranger.log"
    shell:
        "env && cellranger --version &> {log}"
kevinrue commented 7 months ago

Sorry for the confusion. I didn't mean that it was not installed. I meant that I hadn't installed it explicitly myself.

$ mamba list | grep snakemake
# packages in environment at /ceph/project/sims-lab/albrecht/miniforge3/envs/snakemake:
snakemake                 8.10.6               hdfd78af_0    bioconda
snakemake-executor-plugin-slurm 0.4.2                    pypi_0    pypi
snakemake-executor-plugin-slurm-jobstep 0.1.11                   pypi_0    pypi
snakemake-interface-common 1.17.1             pyhdfd78af_0    bioconda
snakemake-interface-executor-plugins 9.1.0              pyhdfd78af_0    bioconda
snakemake-interface-report-plugins 1.0.0              pyhdfd78af_0    bioconda
snakemake-interface-storage-plugins 3.1.1              pyhdfd78af_0    bioconda
snakemake-minimal         8.10.6             pyhdfd78af_0    bioconda

To clarify, I've installed Snakemake and the snakemake-executor-plugin-slurm plugin in a brand new environment using the following YAML file:

name: snakemake
channels:
  - conda-forge
  - bioconda
  - defaults
dependencies:
  - python=3.12
  - snakemake
  - pip:
    - snakemake-executor-plugin-slurm

Let me know if I need to reinstall anything (including the whole Conda environment), but in the meantime, the log of the adjusted rule you asked is (after adjusting the missing indentation space in front of log:):

cellranger.log

/usr/bin/bash: line 1: cellranger: command not found

Do you need the console log or the JOBID.log too?

Seems to be that the env command somehow had its output sent to JOBID.log

[Thu Apr 11 14:30:50 2024]
localrule my_cellranger:
    log: logs/cellranger.log
    jobid: 0
    reason: Rules with neither input nor output files are always executed.
    resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=/var/scratch/albrecht/1195033, slurm_partition=short, runtime=10

env && cellranger --version &> logs/cellranger.log
Activating environment modules: cellranger/7.2.0
/usr/bin/bash: line 1: module: command not found
SHELL=/bin/bash
COLORTERM=truecolor
SLURM_STEP_NUM_TASKS=1
SLURM_JOB_USER=albrecht
SLURM_TASKS_PER_NODE=1
SLURM_JOB_UID=20335
TERM_PROGRAM_VERSION=1.88.0
SLURM_TASK_PID=237274
DRMAA_LIBRARY_PATH=/usr/lib/libdrmaa.so.1.0.8
CONDA_EXE=/ceph/project/sims-lab/albrecht/miniforge3/bin/conda
_CE_M=
SLURM_LOCALID=0
PYTHONNOUSERSITE=literallyanyletters
SLURM_SUBMIT_DIR=/ceph/project/tendonhca/albrecht/test-snakemake
HOSTNAME=imm-wn7
SLURMD_NODENAME=imm-wn7
NUMEXPR_NUM_THREADS=1
SLURM_STEP_NODELIST=imm-wn7
SLURM_NODE_ALIASES=(null)
SLURM_CLUSTER_NAME=ccb-u22
SLURM_CPUS_ON_NODE=1
SLURM_UMASK=0022
SLURM_JOB_CPUS_PER_NODE=1
PWD=/project/tendonhca/albrecht/test-snakemake
SLURM_GTIDS=0
LOGNAME=albrecht
XDG_SESSION_TYPE=tty
CONDA_PREFIX=/ceph/project/sims-lab/albrecht/miniforge3/envs/snakemake
SLURM_JOB_PARTITION=short
MODULESHOME=/usr/share/modules
MANPATH=:
SLURM_JOB_NUM_NODES=1
SRUN_DEBUG=3
SLURM_STEPID=0
TEMPDIR=/var/scratch/albrecht/1195033
VSCODE_GIT_ASKPASS_NODE=/ceph/project/sims-lab/albrecht/.vscode-server/cli/servers/Stable-5c3e652f63e798a5ac2f31ffd0d863669328dc4c/server/node
OPENBLAS_NUM_THREADS=1
SLURM_JOBID=1195033
SLURM_LAUNCH_NODE_IPADDR=127.0.0.1
SLURM_JOB_QOS=normal
MOTD_SHOWN=pam
HOME=/home/a/albrecht
LANG=en_GB.UTF-8
SLURM_PROCID=0
SSL_CERT_DIR=/usr/lib/ssl/certs
CONDA_PROMPT_MODIFIER=(snakemake) 
TMPDIR=/var/scratch/albrecht/1195033
GIT_ASKPASS=/ceph/project/sims-lab/albrecht/.vscode-server/cli/servers/Stable-5c3e652f63e798a5ac2f31ffd0d863669328dc4c/server/extensions/git/dist/askpass.sh
SLURM_CPUS_PER_TASK=1
SLURM_NTASKS=1
SLURM_TOPOLOGY_ADDR=imm-wn7
SSH_CONNECTION=129.67.117.38 51678 163.1.16.102 22
VECLIB_MAXIMUM_THREADS=1
GOTO_NUM_THREADS=1
SLURM_DISTRIBUTION=cyclic
VSCODE_GIT_ASKPASS_EXTRA_ARGS=
SLURM_TOPOLOGY_ADDR_PATTERN=node
SLURM_SRUN_COMM_HOST=127.0.0.1
XDG_SESSION_CLASS=user
SLURM_SCRIPT_CONTEXT=prolog_task
SLURM_MEM_PER_NODE=1000
SLURM_WORKING_CLUSTER=ccb-u22:imm-slurm-u22:6817:9728:109
TERM=xterm-256color
_CE_CONDA=
USER=albrecht
SLURM_NODELIST=imm-wn7
VSCODE_GIT_IPC_HANDLE=/run/user/20335/vscode-git-80e91a337c.sock
ENVIRONMENT=BATCH
CONDA_SHLVL=2
SLURM_SRUN_COMM_PORT=44257
SNAKEMAKE_PROFILE=default
LOADEDMODULES=
TEMP=/var/scratch/albrecht/1195033
SLURM_STEP_ID=0
SLURM_PRIO_PROCESS=0
SLURM_NPROCS=1
SHLVL=2
SLURM_NNODES=1
XDG_SESSION_ID=291740
SLURM_SUBMIT_HOST=imm-login3
CONDA_PYTHON_EXE=/ceph/project/sims-lab/albrecht/miniforge3/bin/python
SLURM_JOB_CPUS_PER_NODE_PACK_GROUP_0=1
XDG_RUNTIME_DIR=/run/user/20335
SLURM_JOB_ID=1195033
SSL_CERT_FILE=/usr/lib/ssl/certs/ca-certificates.crt
SLURM_NODEID=0
SLURM_STEP_NUM_NODES=1
SSH_CLIENT=129.67.117.38 51678 22
CONDA_DEFAULT_ENV=snakemake
OMP_NUM_THREADS=1
SLURM_STEP_TASKS_PER_NODE=1
VSCODE_GIT_ASKPASS_MAIN=/ceph/project/sims-lab/albrecht/.vscode-server/cli/servers/Stable-5c3e652f63e798a5ac2f31ffd0d863669328dc4c/server/extensions/git/dist/askpass-main.js
XDG_DATA_DIRS=/usr/local/share:/usr/share:/var/lib/snapd/desktop
TMP=/var/scratch/albrecht/1195033
SLURM_CONF=/etc/slurm/slurm.conf
BROWSER=/ceph/project/sims-lab/albrecht/.vscode-server/cli/servers/Stable-5c3e652f63e798a5ac2f31ffd0d863669328dc4c/server/bin/helpers/browser.sh
ALTERNATE_EDITOR=
PATH=/ceph/project/sims-lab/albrecht/miniforge3/envs/snakemake/bin:/ceph/project/sims-lab/albrecht/miniforge3/condabin:/ceph/project/sims-lab/albrecht/.vscode-server/cli/servers/Stable-5c3e652f63e798a5ac2f31ffd0d863669328dc4c/server/bin/remote-cli:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/a/albrecht/.local/bin:/home/a/albrecht/bin:/home/a/albrecht/.local/bin:/home/a/albrecht/bin
SLURM_JOB_NAME=cf7de4a0-a8b4-42d3-903a-e91311debc93
MODULEPATH=/etc/environment-modules/modules:/usr/share/modules/versions:/usr/share/modules/$MODULE_VERSION/modulefiles:/usr/share/modules/modulefiles:/package/modulefiles
DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/20335/bus
MKL_NUM_THREADS=1
CONDA_PREFIX_1=/ceph/project/sims-lab/albrecht/miniforge3
SLURM_STEP_LAUNCHER_PORT=44257
SLURM_JOB_GID=20335
OLDPWD=/project/tendonhca/albrecht
SLURM_JOB_NODELIST=imm-wn7
MODULES_CMD=/usr/lib/x86_64-linux-gnu/modulecmd.tcl
TERM_PROGRAM=vscode
VSCODE_IPC_HOOK_CLI=/run/user/20335/vscode-ipc-62916236-657b-4b2a-9f00-fd4fdc78a356.sock
_=/usr/bin/env
Full Traceback (most recent call last):
  File "/ceph/project/sims-lab/albrecht/miniforge3/envs/snakemake/lib/python3.12/site-packages/snakemake/executors/local.py", line 420, in run_wrapper
    run(
  File "/ceph/project/tendonhca/albrecht/test-snakemake/Snakefile", line 17, in __rule_my_cellranger
  File "/ceph/project/sims-lab/albrecht/miniforge3/envs/snakemake/lib/python3.12/site-packages/snakemake/shell.py", line 297, in __new__
    raise sp.CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'module purge && module load cellranger/7.2.0; set -euo pipefail;  env && cellranger --version &> logs/cellranger.log' returned non-zero exit status 127.

Full Traceback (most recent call last):
  File "/ceph/project/sims-lab/albrecht/miniforge3/envs/snakemake/lib/python3.12/site-packages/snakemake/executors/local.py", line 420, in run_wrapper
    run(
  File "/ceph/project/tendonhca/albrecht/test-snakemake/Snakefile", line 17, in __rule_my_cellranger
  File "/ceph/project/sims-lab/albrecht/miniforge3/envs/snakemake/lib/python3.12/site-packages/snakemake/shell.py", line 297, in __new__
    raise sp.CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'module purge && module load cellranger/7.2.0; set -euo pipefail;  env && cellranger --version &> logs/cellranger.log' returned non-zero exit status 127.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/ceph/project/sims-lab/albrecht/miniforge3/envs/snakemake/lib/python3.12/site-packages/snakemake/executors/local.py", line 259, in _callback
    raise ex
  File "/ceph/project/sims-lab/albrecht/miniforge3/envs/snakemake/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/ceph/project/sims-lab/albrecht/miniforge3/envs/snakemake/lib/python3.12/site-packages/snakemake/executors/local.py", line 245, in cached_or_run
    run_func(*args)
  File "/ceph/project/sims-lab/albrecht/miniforge3/envs/snakemake/lib/python3.12/site-packages/snakemake/executors/local.py", line 456, in run_wrapper
    raise RuleException(
snakemake.exceptions.RuleException: CalledProcessError in file /ceph/project/tendonhca/albrecht/test-snakemake/Snakefile, line 6:
Command 'module purge && module load cellranger/7.2.0; set -euo pipefail;  env && cellranger --version &> logs/cellranger.log' returned non-zero exit status 127.

RuleException:
CalledProcessError in file /ceph/project/tendonhca/albrecht/test-snakemake/Snakefile, line 6:
Command 'module purge && module load cellranger/7.2.0; set -euo pipefail;  env && cellranger --version &> logs/cellranger.log' returned non-zero exit status 127.
[Thu Apr 11 14:30:50 2024]
Error in rule my_cellranger:
    jobid: 0
    log: logs/cellranger.log (check log file(s) for error details)
    shell:
        env && cellranger --version &> logs/cellranger.log
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Storing output in storage.
Full Traceback (most recent call last):
  File "/ceph/project/sims-lab/albrecht/miniforge3/envs/snakemake/lib/python3.12/site-packages/snakemake/cli.py", line 2068, in args_to_api
    dag_api.execute_workflow(
  File "/ceph/project/sims-lab/albrecht/miniforge3/envs/snakemake/lib/python3.12/site-packages/snakemake/api.py", line 589, in execute_workflow
    workflow.execute(
  File "/ceph/project/sims-lab/albrecht/miniforge3/envs/snakemake/lib/python3.12/site-packages/snakemake/workflow.py", line 1285, in execute
    raise WorkflowError("At least one job did not complete successfully.")
snakemake_interface_common.exceptions.WorkflowError: At least one job did not complete successfully.
cmeesters commented 7 months ago

ah, you are working with system c-modules. I am not sure how up-to-date your version is, so:

What is - on the login node the output of:

module avail |& grep -i cellranger

And the same in job context. Get an interactive job like:

srun -p short --pty bash -i and then run module avail |& grep -i cellranger

You might have module spider celranger available, or I guessed wrongly and you have to adapt your interactive job command ...

kevinrue commented 7 months ago

ah, you are working with system c-modules. I am not sure how up-to-date your version is

I never paid attention what kind of module system we're asked to use 😬

module --help says: Modules Release 5.0.1 (2021-10-16)

Login node

$ module avail |& grep -i cellranger
alicevision/2.2.0       cellranger-arc/2.0.2   fastq_screen/0.14.1   impute2/2.3.2                     onetbb/2021.3.0       qupath/0.2.3         sratoolkit/2.10.8        
annovar/20180416        cellranger-atac/1.2.0  fastqc/0.11.5         index-hopping-filter/1.1          openjdk/9.0.4         R-base-lapack/4.3.0  sratoolkit/3.0.0         
annovar/20191024        cellranger-atac/2.1.0  fastqc/0.11.9         infercnv/current                  openjdk/10.0.2        R-base/4.3.0         star-fusion/1.11.0       
annovar/20230914        cellranger/2.1.1       fftw/3.3.10           jags/4.3.0                        openjdk/11.0.2        R-cbrg/202307        STAR/2.6.0c              
aspera/3.9.8            cellranger/3.0.1       fgbio/20210122        java/17.0.1                       openjdk/12.0.2        R-cbrg/202310        STAR/2.6.1d              
aspera/3.9.9            cellranger/3.1.0       fiji/20190409         java/17.0.10                      openjdk/13.0.2        R-cbrg/202401        STAR/2.7.3a              
atacqc/20170728         cellranger/4.0.0       flash/1.2.11          java/21.0.2                       openjdk/14.0.2        R-cbrg/202404        STAR/2.7.8a              
avocato-deps/20221121   cellranger/5.0.0       flexbar/3.5.0         jellyfish/2.3.0                   openjdk/15.0.2        R-cbrg/current       STAR/2.7.11a             
avocato-deps/20230726   cellranger/6.0.1       freebayes/1.3.4       jq/1.6                            openjdk/16.0.2        R-natverse/20230822  stringtie/2.1.4          
aws-cli/20220822        cellranger/6.1.1       freesasa/2.0.3        juicebox/1.9.8                    openjdk/17.0.2        racon/1.4.3          subread/2.0.0            
bam-readcount/20201113  cellranger/6.1.2       gatk/4.2.0.0v2        juicebox/2.10.01                  openjdk/18.0.2        racon/1.4.21         subread/2.0.3            
bamscale/0.0.5          cellranger/7.0.0       gatk/4.3.0.0v2        julia/1.4.1                       openjdk/19.0.1        regtools/1.0.0       subread/2.0.6            
bamscale/1.0            cellranger/7.0.1       gatk/4.4.0.0          kallisto/0.46.1                   openjdk/20.0.2        repeatmasker/4.1.2   subset-bam/1.1.0         
bamtools/2.5.1          cellranger/7.1.0       gctree/20210514       lanceotron/20230726               pandaseq/2.11         rmblast/2.10.0       SWIG/4.2.1               
bamutil/1.0.14          cellranger/7.2.0       genrich/0.6.1         libBigWig/0.4.4                   pear/20210121         rna-star/2.6.0c      tdrmapper/20210407       
cellranger-arc/1.0.1    fastq-pair/1.0         ilastik/1.3.3post3    octopus/20200902                  qtltools/1.2          sqlite/3.37.2        
cellranger-arc/2.0.0    fastq-tools/0.8.3      imacyte/20200929      omero/5.2.4                       qtltools/1.3.1        sratoolkit/2.9.6

srun

$ module avail |& grep -i cellranger
alicevision/2.2.0       cellranger-arc/2.0.2   fastq_screen/0.14.1   impute2/2.3.2                     onetbb/2021.3.0       qupath/0.2.3         sratoolkit/2.10.8        
annovar/20180416        cellranger-atac/1.2.0  fastqc/0.11.5         index-hopping-filter/1.1          openjdk/9.0.4         R-base-lapack/4.3.0  sratoolkit/3.0.0         
annovar/20191024        cellranger-atac/2.1.0  fastqc/0.11.9         infercnv/current                  openjdk/10.0.2        R-base/4.3.0         star-fusion/1.11.0       
annovar/20230914        cellranger/2.1.1       fftw/3.3.10           jags/4.3.0                        openjdk/11.0.2        R-cbrg/202307        STAR/2.6.0c              
aspera/3.9.8            cellranger/3.0.1       fgbio/20210122        java/17.0.1                       openjdk/12.0.2        R-cbrg/202310        STAR/2.6.1d              
aspera/3.9.9            cellranger/3.1.0       fiji/20190409         java/17.0.10                      openjdk/13.0.2        R-cbrg/202401        STAR/2.7.3a              
atacqc/20170728         cellranger/4.0.0       flash/1.2.11          java/21.0.2                       openjdk/14.0.2        R-cbrg/202404        STAR/2.7.8a              
avocato-deps/20221121   cellranger/5.0.0       flexbar/3.5.0         jellyfish/2.3.0                   openjdk/15.0.2        R-cbrg/current       STAR/2.7.11a             
avocato-deps/20230726   cellranger/6.0.1       freebayes/1.3.4       jq/1.6                            openjdk/16.0.2        R-natverse/20230822  stringtie/2.1.4          
aws-cli/20220822        cellranger/6.1.1       freesasa/2.0.3        juicebox/1.9.8                    openjdk/17.0.2        racon/1.4.3          subread/2.0.0            
bam-readcount/20201113  cellranger/6.1.2       gatk/4.2.0.0v2        juicebox/2.10.01                  openjdk/18.0.2        racon/1.4.21         subread/2.0.3            
bamscale/0.0.5          cellranger/7.0.0       gatk/4.3.0.0v2        julia/1.4.1                       openjdk/19.0.1        regtools/1.0.0       subread/2.0.6            
bamscale/1.0            cellranger/7.0.1       gatk/4.4.0.0          kallisto/0.46.1                   openjdk/20.0.2        repeatmasker/4.1.2   subset-bam/1.1.0         
bamtools/2.5.1          cellranger/7.1.0       gctree/20210514       lanceotron/20230726               pandaseq/2.11         rmblast/2.10.0       SWIG/4.2.1               
bamutil/1.0.14          cellranger/7.2.0       genrich/0.6.1         libBigWig/0.4.4                   pear/20210121         rna-star/2.6.0c      tdrmapper/20210407       
cellranger-arc/1.0.1    fastq-pair/1.0         ilastik/1.3.3post3    octopus/20200902                  qtltools/1.2          sqlite/3.37.2        
cellranger-arc/2.0.0    fastq-tools/0.8.3      imacyte/20200929      omero/5.2.4                       qtltools/1.3.1        sratoolkit/2.9.6

There's a typo (missing L) in your module spider celranger but irrespective I'm not sure what you mean with the spider bit. I don't see any module matching 'spider'.

Also, I'm not sure what your |& grep -i is doing. It seems to me you're looking for available cellranger modules, which I usually do with simply module avail cellranger, which gives

$ module avail cellranger
---------------------------------------------------------------------------------- /package/modulefiles -----------------------------------------------------------------------------------
cellranger-arc/1.0.1  cellranger-arc/2.0.2   cellranger-atac/2.1.0  cellranger/3.0.1  cellranger/4.0.0  cellranger/6.0.1  cellranger/6.1.2  cellranger/7.0.1  cellranger/7.2.0  
cellranger-arc/2.0.0  cellranger-atac/1.2.0  cellranger/2.1.1       cellranger/3.1.0  cellranger/5.0.0  cellranger/6.1.1  cellranger/7.0.0  cellranger/7.1.0  

Key:
modulepath

Thanks again for your time so far!

cmeesters commented 7 months ago

module spider ... is the search command for lmod - I was under the impression, that the c-version implements it as well, yet your version might be too old.

Anyway, the module command works in job context and the module path is the same.

I need some time - I will be teaching next week.

kevinrue commented 7 months ago

No worries. Thanks for the help. I've got my workaround for now, which is that I've got few enough samples to write and submit the job scripts myself using sbatch.

If only I could figure out what changed since I last used snakemake back in January. Really weird.

kevinrue commented 7 months ago

FWIW

$ module --help
Modules Release 5.0.1 (2021-10-16)
Usage: module [options] [command] [args ...]

Loading / Unloading commands:
  add | load      modulefile [...]  Load modulefile(s)
  try-add | try-load modfile [...]  Load modfile(s), no complain if not found
  rm | unload     modulefile [...]  Remove modulefile(s)
  purge                             Unload all loaded modulefiles
  reload                            Unload then load all loaded modulefiles
  switch | swap   [mod1] mod2       Unload mod1 and load mod2
  refresh                           Refresh loaded module volatile components

Listing / Searching commands:
  list            [-t|-l|-j]        List loaded modules
  avail   [-d|-L] [-t|-l|-j] [-a] [-S|-C] [--indepth|--no-indepth] [mod ...]
                                    List all or matching available modules
  aliases         [-a]              List all module aliases
  whatis [-a] [-j] [modulefile ...] Print whatis information of modulefile(s)
  apropos | keyword | search [-a] [-j] str
                                    Search all name and whatis containing str
  is-loaded       [modulefile ...]  Test if any of the modulefile(s) are loaded
  is-avail        modulefile [...]  Is any of the modulefile(s) available
  info-loaded     modulefile        Get full name of matching loaded module(s)

Collection of modules handling commands:
  save            [collection|file] Save current module list to collection
  restore         [collection|file] Restore module list from collection or file
  saverm          [collection]      Remove saved collection
  saveshow        [collection|file] Display information about collection
  savelist        [-t|-l|-j]        List all saved collections
  is-saved        [collection ...]  Test if any of the collection(s) exists

Environment direct handling commands:
  prepend-path [-d c] var val [...] Prepend value to environment variable
  append-path [-d c] var val [...]  Append value to environment variable
  remove-path [-d c] var val [...]  Remove value from environment variable

Other commands:
  help            [modulefile ...]  Print this or modulefile(s) help info
  display | show  modulefile [...]  Display information about modulefile(s)
  test            [modulefile ...]  Test modulefile(s)
  use     [-a|-p] dir [...]         Add dir(s) to MODULEPATH variable
  unuse           dir [...]         Remove dir(s) from MODULEPATH variable
  is-used         [dir ...]         Is any of the dir(s) enabled in MODULEPATH
  path            modulefile        Print modulefile path
  paths           modulefile        Print path of matching available modules
  clear           [-f]              Reset Modules-specific runtime information
  source          scriptfile [...]  Execute scriptfile(s)
  config [--dump-state|name [val]]  Display or set Modules configuration
  sh-to-mod       shell shellscript [arg ...]
                                    Make modulefile from script env changes
  edit            modulefile        Open modulefile in editor

and

$ module spider
ERROR: Invalid command 'spider'
  Try 'module --help' for more information.
cmeesters commented 6 months ago

Stupid me! Did you run Snakemake with --sdm=env-modules?

kevinrue commented 6 months ago

Thanks for coming back to me.

I just did and I see no difference:

Building DAG of jobs...
Using shell: /usr/bin/bash
Provided remote nodes: 1
Provided resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954
Select jobs to execute...
Execute 1 jobs...

[Fri Apr 26 09:53:25 2024]
rule my_cellranger:
    log: logs/cellranger.log
    jobid: 0
    reason: Rules with neither input nor output files are always executed.
    resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, slurm_partition=short, runtime=10

. /etc/profile.d/modules.sh && env && cellranger --version &> logs/cellranger.log
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Provided resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954
Select jobs to execute...
Execute 1 jobs...

[Fri Apr 26 09:53:33 2024]
localrule my_cellranger:
    log: logs/cellranger.log
    jobid: 0
    reason: Rules with neither input nor output files are always executed.
    resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=/var/scratch/albrecht/1280293, slurm_partition=short, runtime=10

env && cellranger --version &> logs/cellranger.log
Activating environment modules: cellranger/7.2.0
/usr/bin/bash: line 1: module: command not found

A couple of thoughts:

  1. IT told me only login shells define the module function. I cannot figure out whether the snakemake slurm plugin launches login shells (bash --login) or not. Also, I've recently run into a login shell issue with VScode, and I wonder if this could be related here: https://github.com/microsoft/vscode-remote-release/issues/1671#issuecomment-2049250849
  2. IT also told me that Slurm jobs receive environment variables from the submission Shell, but not Bash functions, which again may explain why the module Bash function doesn't make it to whatever shell the plugin uses to load the environment module.

Perhaps I should get you in touch directly with our IT team rather than acting as middle-man and potentially fudging the information I'm passing back and forth?

cmeesters commented 6 months ago

Oh, this never ending not-invented-here syndrome, resulting in how-do-we-deviate-best-from-commuity-standards ...

Yes, the module command is a shell function. And it should be available on compute nodes. Why? Because otherwise it forces people to write their batch scripts without documenting the software therein. Where else? In a different statement, issued before submission. (My 2 cents: Most likely in .bashrc, where it will lead inevitable to conflicts over time, due to accumulating modules - at least for serious long-term analysis.)

Oh, wait: The plugin already does sbatch ... --export=ALL ... .

We strive to provide portable workflows, where the workflow should not be tinkered with when shipped (because every additional information is in profile and/or config files). Here, you could write a rule to load the module (a local rule) and then submit your job, but that result in a slightly odd software pattern, right? So, yes, talk to your admins. (Snakemake launches no shell at all upon submission. It assumes a working shell with an environment satisfying its needs, when in job context. You can see the launch, when running Snakemake with --verbose.)

As to your other question about the job log: Snakemake submits itself. Hence, it needs to figure out what kind of rule it is dealing with. It will receive a part of the workflow (e.g. one rule), and subsequently interprets it to be local.

kevinrue commented 6 months ago

Either I misundertood your suggestion, or it's also not working

#shell.executable("/bin/bash")
#shell.prefix("source ~/.bashrc; ")

localrules: 
    module_init

rule all:
    input:
        "logs/cellranger.log"

rule module_init:
    output:
        "logs/module_init.log"
    shell:
        ". /etc/profile.d/modules.sh &> {output}"

rule my_cellranger:
    input:
        module="logs/module_init.log"
    output:
        "logs/cellranger.log"
    envmodules:
        "cellranger/7.2.0"
    # log: "logs/cellranger.log"
    shell:
        "env && cellranger --version &> {output}"
        #"echo $SHELL && which conda && conda info && module avail cellranger && env && cellranger --version &> {log}"

# shopt login_shell -> off and returns error code 1

Still getting /usr/bin/bash: line 1: module: command not found

Anyway, at this point don't worry, I'll check with IT again if they want to offer a solution, but a simple workaround is to manually load the modules in my Bash terminal before running snakemake, and not using the --use-envmodules option.

cmeesters commented 6 months ago

What I meant is to write a rule which explicitly performs module load. Which is silly as every work-around is because workarounds should be temporary and tend to be become established.

If the module command does not work on the compute nodes, there is no way to trigger it there - all you can do is to export the entire environment, which Snakemake does by default.