mozack / abra

Assembly Based ReAligner
MIT License
72 stars 12 forks source link

Exception in thread "Thread-212177" java.lang.OutOfMemoryError: PermGen space #6

Closed Gig77 closed 10 years ago

Gig77 commented 10 years ago

Got the following error message:

[main] Version: 0.7.9a-r786 [main] CMD: bwa samse -n 1000 abra_temp_dir/clean_contigs.fasta abra_temp_dir/temp3/align_to_contig.sam.sai abra_temp_dir/temp3/original_reads.fastq.gz [main] Real time: 1067.556 sec; CPU: 888.432 sec Stream thread done. Stream thread done. BWA time: 1069 seconds. Clock time in Align to contigs: 12671 Sun Jun 15 19:51:16 CEST 2014 : Adjust reads Sun Jun 15 19:51:16 CEST 2014 : Adjusting reads. Sun Jun 15 19:51:16 CEST 2014 : Adjusting reads. Sun Jun 15 19:51:16 CEST 2014 : Adjusting reads. Exception in thread "Thread-212177" java.lang.OutOfMemoryError: PermGen space at java.lang.String.intern(Native Method) at net.sf.samtools.SAMSequenceRecord.(SAMSequenceRecord.java:85) at net.sf.samtools.SAMTextHeaderCodec.parseSQLine(SAMTextHeaderCodec.java:209) at net.sf.samtools.SAMTextHeaderCodec.decode(SAMTextHeaderCodec.java:100) at net.sf.samtools.SAMTextReader.readHeader(SAMTextReader.java:185) at net.sf.samtools.SAMTextReader.(SAMTextReader.java:62) at net.sf.samtools.SAMTextReader.(SAMTextReader.java:71) at net.sf.samtools.SAMFileReader.init(SAMFileReader.java:556) at net.sf.samtools.SAMFileReader.(SAMFileReader.java:167) at net.sf.samtools.SAMFileReader.(SAMFileReader.java:122) at abra.ReadAdjuster.adjustReads(ReadAdjuster.java:55) at abra.AdjustReadsRunnable.go(AdjustReadsRunnable.java:37) at abra.AbraRunnable.run(AbraRunnable.java:19) at java.lang.Thread.run(Thread.java:679)

Here is how a I ran abra:

PATH=$PATH:~/tools/bwa-0.7.9 java -Xmx32G -jar ~/tools/abra-0.77/abra-0.77-SNAPSHOT-jar-with-dependencies.jar --in /data/current/bam/108C.duplicate_marked.realigned.recalibrated.bam,/data/current/bam/108D.duplicate_marked.realigned.recalibrated.bam,/data/current/bam/108R.duplicate_marked.realigned.recalibrated.bam --kmer 43,53,63,73,83 --out /data/current/bam/108C.duplicate_marked.realigned.recalibrated.abra.bam,/data/current/bam/108D.duplicate_marked.realigned.recalibrated.abra.bam,/data/current/bam/108R.duplicate_marked.realigned.recalibrated.abra.bam --ref ~/generic/data/broad/human_g1k_v37.fasta --targets <(cut -f 1,2,3 /generic/data/illumina/nexterarapidcapture_exome_targetedregions.nochr.bed) --threads 5 --working abra_temp_dir 2>&1 | grep -v "Max SAM Read name length exceeded" | tee abra.log

How much RAM is required to run abra? I allocated 32Gb for the Java VM. In the example above, I ran abra with three exomes of the same patient, each sequenced at average coverage ~50x.

mozack commented 10 years ago

32GB should be more than enough. We've had good luck with 16GB for paired exomes sequenced to 150X.

A few questions:

1) What version of java are you using?

2) Can you let me know how big the contigs file is? i.e. ls -lh abra_temp_dir/clean_contigs.fasta and wc -l abra_temp_dir/clean_contigs.fasta

3) Can you tell be a bit about the input data? i.e. read length, paired v single end, sequencing technology used, error rates.

Gig77 commented 10 years ago

Java version:

java version "1.6.0_27" OpenJDK Runtime Environment (IcedTea6 1.12.6) (6b27-1.12.6-1~deb7u1) OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)

/data/abra$ ls -lh abra_temp_dir/clean_contigs.fasta -rw-r--r-- 1 cf cf 306M Jun 15 16:20 abra_temp_dir/clean_contigs.fasta

/data/abra$ wc -l abra_temp_dir/clean_contigs.fasta 1249488 abra_temp_dir/clean_contigs.fasta

Input data: Illumina HiSeq 2000, 100bp paired-end, exome sequencing with Nextera Enrichment Kit

Should I try to run it with only 2 BAM files as input instead of 3?

mozack commented 10 years ago

Short answer:

Please run using a recent version of Java 7 and ABRA v0.78.

Detailed answer:

The PermGen error you're seeing is in a different (and much smaller) memory space than the 32GB you've allocated for the java heap. ABRA relies upon Picard Tools' SAM-JDK to read and write BAM files. The SAM-JDK appears to use PermGen space to load information about each @SQ in the BAM header. If you're really interested, you'll find that running Picard's ViewSam on abra_temp_dir/temp1/align_to_contig.sam generates the same error in Java 6. The number of putative contigs that wind up in the BAM header is large.

Recent versions of Java 7 do not make use of the PermGen in the same fashion and will not generate this error.

If you are unable to upgrade to Java 7, you can work around the PermGen issue by including a Java option similar to the follwing: -XX:MaxPermSize=256M However, please keep in mind that our testing has been with Java 7.

Additionally, you may be impacted by issue: https://github.com/mozack/abra/issues/4

I recommend using the v0.78 release instead.

Thanks for reporting this and please let me know if you run into any more problems.

Gig77 commented 10 years ago

Error gone after upgrading to Java 7 and using abra version 0.78.