maxplanck-ie / snakepipes

Customizable workflows based on snakemake and python for the analysis of NGS data
http://snakepipes.readthedocs.io
MIT License
374 stars 85 forks source link

tmp folder statement "tmpdir=<TBD>" #928

Closed sunta3iouxos closed 5 months ago

sunta3iouxos commented 11 months ago

Hi there, While running the pipeline I get the following:

rule origFASTQ2:
    input: /home/tgeorgom/fastq/AP01/A006200317_201082_S22_L000_R2_001.fastq.gz
    output: originalFASTQ/A006200317_201082_S22_L000_R2_001.fastq.gz
    jobid: 30
    reason: Missing output files: originalFASTQ/A006200317_201082_S22_L000_R2_001.fastq.gz
    wildcards: sample=A006200317_201082_S22_L000
    resources: mem_mb=1306, disk_mb=1306, **tmpdir=<TBD>**

Shouldnt the tmpdir be ás dictated in the defaults.yalm?

clusterConfig: shared/cluster.yaml
configMode: manual
emailSender: null
max_thread: 25
oldConfig: null
onlySSL: false
organismsDir: shared/organisms
smtpPassword: null
smtpPort: 0
smtpServer: null
smtpUsername: null
snakemakeOptions: ' --use-conda --conda-prefix /scratch/tgeorgom/mamba/snakePipes/envs '
tempDir: /scratch/tgeorgom/temp/
toolsVersion: true

Also I have set the glabal TMPDIR environment as:

$echo $TMPDIR 
/scratch/tgeorgom/temp/

When I am running the script with the --local option the tmp directory is properly recognised

resources: tmpdir=/tmp
sunta3iouxos commented 11 months ago

I got word from my HPC people to:

use the local job RAMDISK for temporary files. It's sufficient to enlarge the per-core-memory settings (cluster.memory) to compensate for the additional memory usage. For example, in the slurm job template of snakepipe: tmpdir="/dev/shm/${SLURM_JOB_USER}.${SLURM_JOB_ID}" Those environment variables will be set in the job environment and not before. I don't remember if you can use the above setting directly in snakepipes or if the dollar signs have to be escaped. You'll have to test it..

How am I setting this properly?

katsikora commented 9 months ago

Hi,

you can configure your snakePipes installation to use /dev/shm by snakePipes config --tempDir /dev/shm. Snakepipes will then create temporary folders on that volume using random alphanumeric strings appended to "snakepipes". The name of the random temporary folder is determined before and passed to the main snakemake process, so before any jobs are submitted to slurm. Using of slurm 'user' and 'job ID' variables will not work in this case. This setup is meant for execution of the main process on a login node, which will then submit rule-based jobs to the cluster.

Is that how you run snakePipes ? Or do you submit your main process as a cluster job as well?