snakemake / snakemake-storage-plugin-fs

A Snakemake storage plugin that reads and writes from a locally mounted filesystem using rsync.
MIT License
2 stars 2 forks source link

wrong path mapping #14

Closed cmeesters closed 4 months ago

cmeesters commented 6 months ago

I'm afraid, my testing for the resolution in #7 was incomplete.

With the configuration

$ cat ~/.config/snakemake/config.yaml 
__use_yte__: true

__definitions__:
  - import os
  - from pathlib import Path

executor: slurm
latency-wait: 5
default-storage-provider: fs
shared-fs-usage:
  - persistence
  - sources
  - source-cache
?if "SLURM_JOB_ID" in os.environ:
    local-storage-prefix: /localscratch/$SLURM_JOB_ID
?else:
    local-storage-prefix: /dev/shm/$USER/snakemake

(and the env pointing to it), I can see that a directory /localscratch/snakemake<random_string> gets created on the node. However, jobs fail, because within a SLURM job context they are pointed to another directory, i.e. minimap2 -t 4 -d .snakemake/storage/fs/... which simply fails, for no input can be expected there.

Out commenting the fs-plugin-lines in the config help. Yet, eventually, my hope is for better scaling with a proper stage-in. ;-)

The rule in question is defined as

rule build_minimap_index: ## build minimap2 index
    input:
        genome = config["transcriptome"]
    output:
        index = "index/transcriptome_index.mmi"
    params:
        opts = config["minimap_index_opts"]
    conda: "envs/env.yml"
    shell:"""
    minimap2 -t {resources.cpus_per_task} {params.opts} -d {output.index} {input.genome}
    """

(the workflow in question is "work-in-progress", a student of mine will take over) and so far, this rule strictly speaking does not require stage-in/-out.

Any pointer is appreciated!

cmeesters commented 6 months ago

After setting all dependencies right updating correspondingly it works like a charm:

 input: index/transcriptome_index.mmi (retrieve from storage), /lustre/project/m2_zdvhpc/transcriptome_data/m18_bc06.fq.gz (retrieve from storage)
output: alignments/m18_bc01.fq.gz.bam (send to storage)
log: logs/minimap2/m18_bc01.fq.gz.log (send to storage)

Next, we need to restrict stage-in/-out to necessary files. ;-)

cmeesters commented 5 months ago

I'm afraid, after updating, the issue reoccurred:

@johanneskoester ideas?

cmeesters commented 5 months ago

currently:

snakemake 8.10.0
snakemake-executor-plugin-slurm 0.4.2
snakemake-executor-plugin-slurm-jobstep 0.1.11
snakemake-interface-common 1.17.1
snakemake-interface-executor-plugins 9.0.2
snakemake-interface-report-plugins 1.0.0
snakemake-interface-storage-plugins 3.1.1
snakemake-minimal 8.10.0
snakemake-storage-plugin-fs 1.0.0

A rule like:


rule test1:
     input: "foo.txt"
     output: "results/a.out"
     shell: """
       echo $(realpath {input}) >> {output}
     """
``
produces `/dev/shm/meesters/snakemake/fs/foo.txt` which is not desired in job context.
johanneskoester commented 4 months ago

Yes, I noticed later (posted in the other issue), that my idea did not really work, because the profile is not parsed again in the job context (instead, values are passed to the job from the main process). Therefore Snakemake now has a different option remote_job_local_storage_prefix in which you can define a special case for the remote job. In your setup, like this:

executor: slurm
latency-wait: 5
default-storage-provider: fs
shared-fs-usage:
  - persistence
  - sources
  - source-cache
remote-job-local-storage-prefix: /localscratch/$SLURM_JOB_ID
local-storage-prefix: /dev/shm/$USER/snakemake
johanneskoester commented 4 months ago

Extended the docs accordingly (visible with next release in Snakemake plugin catalog).