vaquerizaslab / fanc

FAN-C: Framework for the ANalysis of C-like data
GNU General Public License v3.0
106 stars 14 forks source link

fanc auto got sambamba error: Too many open files #141

Open hero-outman opened 1 year ago

hero-outman commented 1 year ago

Hello, when I am running fanc auto command like:

fanc auto \
    -g ${hg38_MboI} \
    -n fanc.${matrix} \
    -b 5mb 2mb 1mb 500kb 250kb 100kb 50kb 25kb 10kb 5kb \
    -t ${threads} \
    --max-restriction-site-distance 10000 \
    --norm-method KR \
    -q ${mapq} \
    ${matrix_folder}${matrix}_R1.bam  ${matrix_folder}${matrix}_R2.bam \
    ${output_folder}${sample_name} &> HiC_matrix_${matrix}.out

got errors on sambamba Too many open files.:

2023-01-17 17:06:20,197 INFO FAN-C version: 0.9.25
2023-01-17 17:06:20,251 INFO Getting regions
2023-01-17 17:06:20,301 INFO Output folder: ./FAN_C/HiC_matrices/sample/
2023-01-17 17:06:20,301 INFO Input files: ./BWA/sample_rep1_R1.bam, BWA/sample_rep1_R2.
bam
2023-01-17 17:06:20,301 INFO Input file types: sam, sam
2023-01-17 17:06:20,301 INFO Final basename: fanc.sample_rep1 (you can change this with the -n option!)
2023-01-17 17:06:20,302 INFO Creating output folders...
2023-01-17 17:06:21,915 INFO FAN-C version: 0.9.25

sambamba 0.8.2
 by Artem Tarasov and Pjotr Prins (C) 2012-2021
    LDC 1.28.1 / DMD v2.098.1 / LLVM12.0.0 / bootstrap LDC - the LLVM D compiler (1.28.1)

2023-01-17 17:06:22,116 INFO FAN-C version: 0.9.25

sambamba 0.8.2
 by Artem Tarasov and Pjotr Prins (C) 2012-2021
    LDC 1.28.1 / DMD v2.098.1 / LLVM12.0.0 / bootstrap LDC - the LLVM D compiler (1.28.1)

sambamba-sort: Cannot open or create file '/Data/Computing_Temp/sambamba-pid89885-dpay/sample_rep1_R1.bam.339' : Too many open files
2023-01-17 21:47:06,935 WARNING sambamba failed, falling back to pysam/samtools
sambamba-sort: Cannot open or create file '/Data/Computing_Temp/sambamba-pid89875-fldc/sample_rep1_R2.bam.339' : Too many open files
2023-01-17 21:48:01,092 WARNING sambamba failed, falling back to pysam/samtools
2023-01-18 04:47:50,087 INFO FAN-C version: 0.9.25
2023-01-18 04:47:50,106 INFO Getting genome regions (fragments or bins)
2023-01-18 04:47:50,106 INFO Getting regions
2023-01-18 04:47:50,153 INFO Three arguments detected, assuming SAM/BAM input.
[E::idx_find_and_load] Could not retrieve index file for 'FAN_C/HiC_matrices/sample/sam/sample_rep1_R1_sort.bam'
2023-01-18 04:47:50,171 INFO Using filters appropriate for BWA.
[E::idx_find_and_load] Could not retrieve index file for 'FAN_C/HiC_matrices/sample/sam/sample_rep1_R1_sort.bam'
[E::idx_find_and_load] Could not retrieve index file for 'FAN_C/HiC_matrices/sample/sam/sample_rep1_R2_sort.bam'

sambamba failed and temp files were not removed, disk space will run out on those temp files. Also, it is time costing sorted by samtools after already sorted by sambamba.

May I get some hints on:

  1. How to avoid sambamba Too many open files error when running fanc auto
  2. Or how to directly sort by samtools to avoid time costing on failed sambamba sorting

Many thanks!

kaukrise commented 1 year ago

Hi, this is an issue with sambamba and your system's setup (you'll find lots of hits on Google for the "too many open files" issue). I have created a beta version you can try that introduces the --no-sambamba flag to fanc auto and fanc sort-sam. Could you please try it on your data?

fanc-0.9.26b3.tar.gz

hero-outman commented 1 year ago

Hi, kaukrise run fanc-0.9.26b3.tar.gz with param --no-sambamba give errors below, but the same commands work fine when running with fanc-0.9.25 commands:

fanc auto \
    -g ${hg38_MboI.bed} \
    -n fanc.mydata_test_sambamba \
    -b 5mb 2mb 1mb \
    -t ${threads} \
    --max-restriction-site-distance 10000 \
    --norm-method KR \
    -q ${mapq} \
    --no-sambamba \
    mydata_R1_sortname.bam mydata_R2_sortname.bam \
    myFolder &> HiC_matrix_mydata.out &

errors:

2023-02-02 14:42:24,847 INFO FAN-C version: 0.9.26b3
2023-02-02 14:42:24,848 INFO Getting regions
2023-02-02 14:42:24,849 INFO Output folder: myFolder
2023-02-02 14:42:24,849 INFO Input files: mydata_R2.bam
2023-02-02 14:42:24,849 INFO Input file types: sam
2023-02-02 14:42:24,849 INFO Final basename: fanc.mydata_test_sambamba (you can change this with the -n option!)
2023-02-02 14:42:24,849 INFO Creating output folders...
Traceback (most recent call last):
  File "/myhome/mambaforge/envs/FAN-C_test/bin/fanc", line 127, in <module>
    Fanc()
  File "/myhome/mambaforge/envs/FAN-C_test/bin/fanc", line 93, in __init__
    command([sys.argv[0]] + sys.argv[option_ix:], log_level=log_level, verbosity=verbosity)
  File "/myhome/mambaforge/envs/FAN-C_test/lib/python3.9/site-packages/fanc/commands/fanc_commands.py", line 123, in auto
    return fanc.commands.auto.auto(argv, **kwargs)
  File "/myhome/mambaforge/envs/FAN-C_test/lib/python3.9/site-packages/fanc/commands/auto.py", line 1097, in auto
    if not file_types[i + 1] == 'sam':
IndexError: list index out of range

And, sambamba Too many open files issue could be solved when explicit param like -m 10G. can fanc command set this param, or is there a fanc config file that can allocate more memory space for sambamba?

Best and many thanks! Chu