luntergroup / octopus

Bayesian haplotype-based mutation calling
MIT License
299 stars 37 forks source link

Starting Call Set Refinement (CSR) filtering provokes RAM overload #280

Open amolares opened 1 month ago

amolares commented 1 month ago

Hi.

I'm trying to analyze data from viruses (in illumina paired end fastq format), with a size of 100 MB for the sum of R1 and R2, in a machine with 16 cores and 30 GB of ram memory. In previous steps, fastp tool is used to filtering and trimming adapters from raw data, and bwa mem to generate aligned cram file. When Starting Call Set Refinement (CSR) filtering section is reached, RAM is growing up to the 100% of load capacity and kill the process abruptly.

Version octopus version 0.7.4

Command vhd.run.cloud.sh: line 82: 72876 Killed $ octopus --threads $THREADS -P 1 -R $VIRUSREF -I $SAMPLE.CRAM --annotations AD ADP AF --sequence-error-model PCR-free.HiSeq-2500 -o $SAMPLE.OG.vcf.gz

Additional context Previously, another sample of same type of virus, but with a size of 33 MB for the sum of R1 and R2 fastq files, was completely done, with no errors.

Thank you very much for your support.

Svenvdm commented 4 weeks ago

Hi,

I'm currently having similar issues with Octopus processing human whole genomes (30x) using the docker image and singularity. Octopus runs well until the Call Set Refinement filtering is started, after which it is killed by an oom leak. I haven't been able to figure out yet how to solve this. I have tried multiple configurations, adjusting the maximum RAM memory (up to 250Gb, 12Gb per core) without any success. Any ideas?

Version octopus 0.7.4 docker image pulled from: https://hub.docker.com/r/dancooke/octopus

Command singularity run --nv \ --bind /path/to/bamfiles//mnt/bamfiles,\ /path/to/ref:/mnt/ref,\ /path/to/output:/mnt/output, \ /path/to/singularity/image \ -R /mnt/ref/Homo_sapiens_assembly38.fasta\ -I /mnt/bamfiles/${bamName} \ -T chr1 to chrX \ --sequence-error-model PCRF.HISEQ-2500 \ --forest-model /mnt/bamfiles/germline.v0.7.4.forest.gz \ -o /mnt/output/${bamNameWithoutExtension}.vcf.gz \ --threads $threads

Thanks in advance for your time! Kind regards Sven

jelber2 commented 4 weeks ago

In the past, I have only run octopus on 30x WGS human with something like 1900G RAM, but I did not look at the peak RAM usage. I would recommend you split your job into parts, and maybe try again per chromosome. Do you really have the forest model file? That is something that could also cause an error if you do not have it, but I am not sure if it would cause the same OOM error.