snakemake / snakemake-storage-plugin-fs

A Snakemake storage plugin that reads and writes from a locally mounted filesystem using rsync.
MIT License
2 stars 2 forks source link

unsure usage #7

Closed cmeesters closed 7 months ago

cmeesters commented 8 months ago

Finally got to trying the profile with the storage plugin. With a setup analogue to the sample in the docs:

default-storage-provider: fs
shared-fs-usage:
  - persistence
  - sources
  - source-cache
local-storage-prefix: /localscratch/$SLURM_JOB_ID

and the current release 0.1.5 , I get

WorkflowError:
Failed to create local storage prefix /localscratch/$SLURM_JOB_ID/fs
PermissionError: [Errno 13] Permission denied: '/localscratch/$SLURM_JOB_ID'

This is logical: snakemake is invoked on the head node of a cluster, the selected storage prefix only exists in a job context. The aim is to stage-in input files in a slurm job context. Hence, the stage-in needs to be delayed until the job starts. Which is the way to achieve this? (The profile yaml exists on a global parallel file system.)

johanneskoester commented 8 months ago

So, you basically need different values for that parameter depending on the host machine. If the variable is available, you need that path, if not (on the login node) you need a different path (e.g. /tmp/...).

Snakemake profiles can be templated with YTE, see here :-)!

Just do somehting like this:

__use_yte__: True

__definitions__:
  - import os

default-storage-provider: fs
shared-fs-usage:
  - persistence
  - sources
  - source-cache
?if "SLURM_JOB_ID" in os.environ:
    local-storage-prefix: /localscratch/$SLURM_JOB_ID
?else:
    local-storage-prefix: /tmp/$USER/snakemake
cmeesters commented 8 months ago

Oh, no! How cool is this? Thank you.

Believe me: The link to "here" will be easily overlooked - least this is what happened to me ;-). I will put an example in the slurm-executor docs a.s.a.p. because I think that this will be in the interest of many. And in the admin/user slides for the teaching material ...

cmeesters commented 8 months ago

Hm, our little test Snakefile - no workflow-profile - which looks like this:

rule all:
     input: "results/a.out"

rule test1:
     output: "results/a.out"
     shell: "touch {output}"

and the profile from above, I get:

[Tue Feb 20 13:57:31 2024]
localrule test1:
    output: results/a.out (send to storage)
    jobid: 0
    reason: Forced execution
    resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=/localscratch/14834904, slurm_account=m2_zdvhpc, slurm_partition=smp

WorkflowError:
Failed to check existence of results/a.out
TypeError: 'str' object cannot be interpreted as an integer

srun: error: z0176: task 0: Exited with exit code 1
srun: launch/slurm: _step_signal: Terminating StepId=14834904.0
[Tue Feb 20 13:57:31 2024]
Error in rule test1:
    jobid: 0
    output: results/a.out (send to storage)
    shell:
        touch .snakemake/storage/fs/results/a.out
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

WorkflowError:
Failed to check existence of results/a.out
TypeError: 'str' object cannot be interpreted as an integer

Looks like the jobstep-Executor got the right string: tmpdir=/localscratch/14834904, but the prefix is not matching it: .snakemake/storage/fs

I actually wonder where that string comes from ... Sorry for spoiling your streak of luck.

johanneskoester commented 8 months ago

Can you please run this with --verbose, so that we can see the full error stack trace? Thanks!

johanneskoester commented 8 months ago

I think there are two issues here. One (the type error) has been resolved this morning with the release of snakemake-interface-common 1.17.1. The other one is the wrong path for the touched file.

cmeesters commented 8 months ago

Sure thing, that I can do even today - and restrict myself to the output which, I think, is something telling:

The submission string:

sbatch call: sbatch --job-name 910a7280-d6da-4cbf-9a93-44aa7d7048b4 --output /gpfs/fs1/home/meesters/projects/hpc-jgu-lifescience/snakemake-workflows/test-workflow/.snakemake/slurm_logs/rule_test1/%j.log --export=ALL --comment test1 -A m2_zdvhpc -p smp --mem 1000 --cpus-per-task=1 -D /gpfs/fs1/home/meesters/projects/hpc-jgu-lifescience/snakemake-workflows/test-workflow --wrap="python -m snakemake --snakefile '/gpfs/fs1/home/meesters/projects/hpc-jgu-lifescience/snakemake-workflows/test-workflow/Snakefile' --target-jobs 'test1:' --allowed-rules 'test1' --cores 'all' --attempt 1 --force-use-threads  --resources 'mem_mb=1000' 'mem_mib=954' 'disk_mb=1000' 'disk_mib=954'  --force --target-files-omit-workdir-adjustment --keep-storage-local-copies --max-inventory-time 0 --nocolor --notemp --no-hooks --nolock --ignore-incomplete --verbose  --rerun-triggers input mtime software-env code params --conda-frontend 'mamba' --shared-fs-usage sources persistence source-cache --wrapper-prefix 'https://github.com/snakemake/snakemake-wrappers/raw/' --latency-wait 10 --scheduler 'ilp' --set-resources 'test1:slurm_partition=smp' --storage-fs-latency-wait 1 --default-storage-provider 'fs' --default-resources 'mem_mb=min(max(2*input.size_mb, 1000), 8000)' 'disk_mb=max(2*input.size_mb, 1000)' 'tmpdir=system_tmpdir' 'slurm_partition=smp' --executor slurm-jobstep --jobs 1 --mode 'remote'"

The job-log is almost unreadable in this chat. I attached it. 14837473.log

I think, your suspicion is half-confirmed. Alas: $ snakemake --version 8.4.12 and I still got the type error.

johanneskoester commented 7 months ago

Yes, but do you also have snakemake-interface-common 1.17.1?

cmeesters commented 7 months ago

Yes. Although, after the update to the current version of Snakemake (at the time of writing 8.5.1), the errors vanished.