jandrewrfarrell / RUFUS

RUFUS k-mer based genomic variant detection
51 stars 13 forks source link

Memory_issue #11

Closed calhoujd closed 5 years ago

calhoujd commented 5 years ago

Hi, I'm evaluating RUFUS to potentially incorporate into our exome and genome variant calling pipelines. We recently got installed on the HPC (sounds like with your help, thanks much!). I was taking for a test-drive today, and I got the following test run to work fine, producing the single de novo variant it is supposed to after a pretty quick runtime:

"#!/bin/bash

SBATCH -A b1042

SBATCH -p genomics

SBATCH -N 1

SBATCH -n 24

SBATCH -t 40:00:00

SBATCH --mem=70gb

rufus_test2.sh

running on trio_LUR005 for a first test

iFolder="/software/RUFUS/testRun/" oFolder="/projects/b1073/WGS_peds_epilepsies/jdc_sandbox/rufus"

genomeFASTA="/projects/b1073/pipelines/commonref/GRCh38/GRCh38_NoContigs.primary_assembly.genome.fa"

module purge all

module load rufus/latest module load samtools/1.6

bash runRufus.sh --subject Child.bam --controls Mother.bam --controls Father.bam --kmersize 25 --threads 40 --ref human_reference_v37_decoys.fa

hmm specifying output directory isn't listed ??? does it print to stdout? do I need > file.txt ?

sh /software/RUFUS/runRufus.sh -s ${iFolder}/Child.bam -c ${iFolder}/Mother.bam -c ${iFolder}/Father.bam -k 25 -t 24 -m 8 -r /software/RUFUS/resources/references/small_test_human_reference_v37_decoys.fa

sh /software/RUFUS/runRufus.sh -s ${iFolder}/Child.bam -c ${iFolder}/Mother.bam -c ${iFolder}/Father.bam -k 25 -t 24 -m 8 -r /software/RUFUS/resources/references/small_test_human_reference_v37_decoys.fa > ${oFolder}/file2.txt

exit"

However, when I try to run a very similar script, but substituting some of our genomes in, I run into a memory issue (tail of slurm error report):

"~~~~ printing out paramater values used in script ~~~~ value of ProbandGenerator LUR_005_01_noalt_hg38_sort.bam.generator Value of ParentGenerators: LUR_005_02_noalt_hg38_sort.bam.generator LUR_005_03_noalt_hg38_sort.bam.generator Value of K is: 25 Value of Threads is: 24 value of ref is: /projects/b1073/pipelines/commonref/GRCh38/GRCh38_NoContigs.primary_assembly.genome.fa value of min is:


Did not provide refHash
$_arg_min is empty
_arg_min is 
MutantMinCov is 
parent is  LUR_005_02_noalt_hg38_sort.bam.generator 
parent is  LUR_005_03_noalt_hg38_sort.bam.generator 
Running jellyfish for LUR_005_02_noalt_hg38_sort.bam.generator
Running jellyfish for LUR_005_01_noalt_hg38_sort.bam.generator
Running jellyfish for LUR_005_03_noalt_hg38_sort.bam.generator
slurmstepd: error: Job 475163 exceeded memory limit (103113338880 > 75161927680)"

I used slurm to specify 70 gigs of memory, I figured that would be enough for one trio on hg38, but maybe not? Any advice? Thanks in advance!

Take care,

Jeff (Postdoc in the Carvill lab at Northwestern)

PS - Here is full script:

"#!/bin/bash
#SBATCH -A b1042
#SBATCH -p genomics
#SBATCH -N 1
#SBATCH -n 24
#SBATCH -t 40:00:00
#SBATCH --mem=70gb

# rufus_test.sh
# running on trio_LUR005 for a first test

iFolder="/projects/b1073/WGS_peds_epilepsies/trio_LUR005/bams"
oFolder="/projects/b1073/WGS_peds_epilepsies/jdc_sandbox/rufus"

genomeFASTA="/projects/b1073/pipelines/commonref/GRCh38/GRCh38_NoContigs.primary_assembly.genome.fa"

module purge all

module load rufus/latest
module load samtools/1.6

# bash runRufus.sh --subject Child.bam --controls Mother.bam  --controls Father.bam  --kmersize 25 --threads 40 --ref human_reference_v37_decoys.fa
# hmm specifying output directory isn't listed ??? does it print to stdout? do I need > file.txt ?

sh /software/RUFUS/runRufus.sh --subject ${iFolder}/LUR_005_01_noalt_hg38_sort.bam --controls ${iFolder}/LUR_005_02_noalt_hg38_sort.bam  --controls ${iFolder}/LUR_005_03_noalt_hg38_sort.bam  --kmersize 25 --threads 24 --ref $genomeFASTA

sh /software/RUFUS/runRufus.sh --subject ${iFolder}/LUR_005_01_noalt_hg38_sort.bam --controls ${iFolder}/LUR_005_02_noalt_hg38_sort.bam  --controls ${iFolder}/LUR_005_03_noalt_hg38_sort.bam  --kmersize 25 --threads 24 --ref $genomeFASTA > ${oFolder}/file.txt

exit"

PPS - The script didn't crash per se, but appears to have stalled. The Jelly.chr files are slowly being updated as it runs. The generator Jhash.temp & fq files are empty
calhoujd commented 5 years ago

It wasn't actually stalled, after a couple of hours, we got de novo variants. Closing issue. This tool looks like it is going to be super useful, thanks for building and making available!