langmead-lab / monorail-external

examples to run monorail externally
MIT License
13 stars 5 forks source link

Error with run_recount_pump.sh: samtools sort "Too many open files" #7

Closed dfermin closed 2 years ago

dfermin commented 2 years ago

Hello.

I'm trying to run the latest singularity image (1.0.8) with some FASTQ data files I have.

I'm running the run_recount_pump.sh script on a CENTOS 7 server. The system has 256GB of RAM. The disk the FASTQ files and monorail-external files are on has 25TB of space. The server has 24-threads. Both FASTQ files are gzipped and are 5GB in size.

I issued this command:

export NO_SHARED_MEM=1 && bash run_recount_pump.sh /nfs/md1/RECOUNT3/images/recount-rs5_latest.sif \
  X249521439 local hg38 20 \
  /nfs/md1/RECOUNT3/db /nfs/md1/RECOUNT3/tmp/clean.fastq/249521439_R1.fastq.gz \
  /nfs/md1/RECOUNT3/tmp/clean.fastq/249521439_R2.fastq.gz \
  X24952143

And I get this error message:

rule sort:
    input: /container-mounts/recount/temp_big/X249521439!X249521439!hg38!local.bam
    output: /container-mounts/recount/output/X249521439!X249521439!hg38!local~sorted.bam, /container-mounts/recount/output/X249521439!X249521439!hg38!local~sorted.bam.bai, /container-mounts/recount/output/X2
49521439!X249521439!hg38!local.idxstats
    log: /container-mounts/recount/output/X249521439!X249521439!hg38!local.sort.log
    jobid: 13
    wildcards: quad=X249521439!X249521439!hg38!local
    threads: 8

[bam_sort_core] merging from 1048 files and 8 in-memory blocks...
[E::hts_open_format] Failed to open file /container-mounts/recount/temp/sort_temp.X249521439/samtools_temp.1020.bam
samtools sort: fail to open "/container-mounts/recount/temp/sort_temp.X249521439/samtools_temp.1020.bam": Too many open files

Any suggestion on how to fix this?

Thanks in advance for any and all help.

ChristopherWilks commented 2 years ago

hi @dfermin,

First, I'd suggest sticking with the stable version of the pump singularity image (1.0.6), though that may not fix this specific problem, the later versions are experimental.

The other thing to consider is to lower the number of threads/concurrent processes you're asking for to 2-4 (from the 20 you're specifying), at least just for testing to see if this problem is addressed by that. The number of samtools sort threads and likely number of concurrent open file handles is related to that.

You could also try raising the number of open files on your server using ulimit -n (which is typically 1024 by default on Linux-esq systems, the samtools sort is requiring more than that). I'm not entirely sure this will be inherited by singularity, but it's worth a try if you want to run with all 20 cores.