vtsyvina / CliqueSNV

MIT License
21 stars 5 forks source link

Memory heap getting bigger #4

Closed amirshams84 closed 3 years ago

amirshams84 commented 3 years ago

Hi, I am running the clique svn over a bam file from HIV virus about 140MB

java -Xms50000M -Xmx50000M -jar /clique-snv.jar -m snv-illumina -in TestHIV.trimmed_fastp.mapped_bowtie2.bam -threads 30 -outDir $TEMP_PATH the bam is already cleaned the memory expansion is getting larger and larger -loop1 20GB failed -loop2 30GB failed -loop3 50 GB failed -loop4 100GB still going

is there any way that we can decrease the amount of memory requirement? Ideas1> removing duplicates from BAM Ideas 2> splitting bam

how can I estimate the amount of memory required? something like: the number of reads * number of core + cosmos diameter / IO speed

Thanks

vtsyvina commented 3 years ago

Hello, Can you provide some more details about when exactly it fails? Stack trace would be nice. I can tell more if you can save the output running with "-log" parameter.

Are you using 1.5.0+ version? Recently we solved the problem of another person with just update(in 1.5.0 there were improvements in memory management).

The memory consumption from our tests mainly depends on reads number(with few exceptions with unusual samples). Around 10Gb of RAM per one million reads is more then enough in most cases.

amirshams84 commented 3 years ago

Hi, Thanks for the quick reply here is the log file generated and it reached 91.2GB so I think is it going fail again HIVtest.trimmed_fastp.mapped_bowtie2.haplotype_cliquesnv.log

vtsyvina commented 3 years ago

It is strange. And what is the stack trace when it runs out of memory? So I can see at what exact place it fails

amirshams84 commented 3 years ago

I am not sure how to get the stack trace, but I can hand you over the bam file if you want to try let me know which method works for you

vtsyvina commented 3 years ago

Yes, with bam file it will be even easier to find the problem. Thanks

P.S. Stack trace is just the error with which it fails. Something like this: Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.HashMap.newNode(HashMap.java:1742) at java.util.HashMap.putVal(HashMap.java:630) at java.util.HashMap.put(HashMap.java:611) at java.util.HashSet.add(HashSet.java:219) at edu.gsu.algorithm.AbstractSNV.slowCliquesMerging(AbstractSNV.java:401) at edu.gsu.algorithm.AbstractSNV.getMergedCliques(AbstractSNV.java:59) at edu.gsu.algorithm.SNVIlluminaMethod.getMergedCliques(SNVIlluminaMethod.java:213) at edu.gsu.algorithm.SNVIlluminaMethod.getHaplotypes(SNVIlluminaMethod.java:103) at edu.gsu.start.Start.illumina2SNV(Start.java:108) at edu.gsu.start.Start.main(Start.java:56)

amirshams84 commented 3 years ago

it seems the problem was with the bam file not the application