mozack / abra

Assembly Based ReAligner
MIT License
72 stars 12 forks source link

java.lang.NegativeArraySizeException #12

Closed etwatson closed 9 years ago

etwatson commented 10 years ago

Greetings! I am encountering a java error when running abra. I thought the problem was that my regions are too small (smaller than kmer). I have a de novo genome, and I am interested in coding indels, so my regions file is a CDS.bed file and my gene prediction software produced some impossibly small genes. However, a larger CDS also produces the same error.

Here is my command:

java -Xmx34g -jar abra-0.82.jar --in SC_049.srt.RG.bam --ref genome_v3.fasta --out SC_049.abra.bam --working abra/ --targets CDS.bed

It crashes on the first scaffold following a particular region, scaffold_252_213187_213194:

Assembling: -> abra//scaffold_252_68589_69038_contigs.fasta_k13
Done assembling(0): abra//scaffold_252_68389_68789_contigs.fasta_k13, 24
Elapsed_msecs_in_NativeAssembler    Region: scaffold_252_68389_68789    Length: 400 ReadCount:  334 Elapsed 73  Assembled   true
Mon Sep 29 11:15:44 PDT 2014 : Processing region: scaffold_252_213187_213194
java.lang.NegativeArraySizeException
    at java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:64)
    at java.lang.StringBuffer.<init>(StringBuffer.java:108)
    at abra.CompareToReference2.getSequence(CompareToReference2.java:392)
    at abra.KmerSizeEvaluator.getBases(KmerSizeEvaluator.java:44)
    at abra.KmerSizeEvaluator.identifyMinKmer(KmerSizeEvaluator.java:97)
    at abra.NativeAssembler.assembleContigs(NativeAssembler.java:140)
    at abra.ReAligner.processRegion(ReAligner.java:691)
    at abra.ReAlignerRunnable.go(ReAlignerRunnable.java:21)
    at abra.AbraRunnable.run(AbraRunnable.java:19)
    at java.lang.Thread.run(Thread.java:745)
java.lang.NegativeArraySizeException
    at java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:64)
    at java.lang.StringBuffer.<init>(StringBuffer.java:108)
    at abra.CompareToReference2.getSequence(CompareToReference2.java:392)
    at abra.KmerSizeEvaluator.getBases(KmerSizeEvaluator.java:44)
    at abra.KmerSizeEvaluator.identifyMinKmer(KmerSizeEvaluator.java:97)
    at abra.NativeAssembler.assembleContigs(NativeAssembler.java:140)
    at abra.ReAligner.processRegion(ReAligner.java:691)
    at abra.ReAlignerRunnable.go(ReAlignerRunnable.java:21)
    at abra.AbraRunnable.run(AbraRunnable.java:19)
    at java.lang.Thread.run(Thread.java:745)
Num reads: 360
Num nodes: 1039
Remaining nodes after pruning step 1: 756
Remaining nodes after pruning step 2: 754
num root nodes: 2
Done assembling(0): abra//scaffold_252_60836_61423_contigs.fasta_k15, 144
Done assembling(0): abra//scaffold_252_68589_69038_contigs.fasta_k13, 289

Having a look at this region, it is smaller than the kmer and likely not real: scaffold_252_213187_213194

However, I get the same error with a larger region (although still very small): scaffold_252_221019_221342

mozack commented 10 years ago

Thanks for narrowing this down to the region size. The assembly step is multi-threaded (by default up to 4 regions are processed simultaneously). The stacktrace you see in the logs may or may not be associated with the log message immediately preceeding it. Could you try again with the small scaffolds removed?

mozack commented 10 years ago

Actually, these smaller regions should be OK. Can you post your bed file?

etwatson commented 10 years ago

I tried rerunning with no regions < 100bp and it still failed.

Here is my BED file: http://www-bcf.usc.edu/~ericwats/CDS.bed

mozack commented 10 years ago

The software currently assumes that the input regions are sorted by coordinate in increasing order. Will make a tweak to handle (or least provide an informative message) in the next release.

The following should get your bed file into a usable format:

cat CDS.bed | sort -k1,1r -k2,2n > CDS2.bed

etwatson commented 10 years ago

Unless I missed it, a line in the manual would do just fine ;)

etwatson commented 10 years ago

After sorting as above, I get the same NegativeArraySizeException much later after processing several contigs.

etwatson commented 10 years ago

Here is the error output:

Done assembling(0): abra//scaffold_108_66239_66803_contigs.fasta_k11, 6
Elapsed_msecs_in_NativeAssembler    Region: scaffold_108_66239_66803    Length: 564 ReadCount:  175 Elapsed 63  Assembled   true
Done assembling(0): abra//scaffold_108_68506_68906_contigs.fasta_k13, 16
Elapsed_msecs_in_NativeAssembler    Region: scaffold_108_68506_68906    Length: 400 ReadCount:  151 Elapsed 66  Assembled   true
Wed Oct 29 17:50:46 PDT 2014 : Processing region: scaffold_108_69854_69855
Wed Oct 29 17:50:46 PDT 2014 : Processing region: scaffold_108_70653_70863
Wed Oct 29 17:50:46 PDT 2014 : Processing region: scaffold_108_70984_71229
Dynamic -- scaffold_108_70653_70863 11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79,81,83,java.lang.NegativeArraySizeException

    at java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:64)
    at java.lang.StringBuffer.<init>(StringBuffer.java:108)
    at abra.CompareToReference2.getSequence(CompareToReference2.java:392)
    at abra.KmerSizeEvaluator.getBases(KmerSizeEvaluator.java:44)
    at abra.KmerSizeEvaluator.identifyMinKmer(KmerSizeEvaluator.java:97)
    at abra.NativeAssembler.assembleContigs(NativeAssembler.java:140)
    at abra.ReAligner.processRegion(ReAligner.java:691)
    at abra.ReAlignerRunnable.go(ReAlignerRunnable.java:21)
    at abra.AbraRunnable.run(AbraRunnable.java:19)
    at java.lang.Thread.run(Thread.java:745)
java.lang.NegativeArraySizeException
    at java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:64)
    at java.lang.StringBuffer.<init>(StringBuffer.java:108)
    at abra.CompareToReference2.getSequence(CompareToReference2.java:392)
    at abra.KmerSizeEvaluator.getBases(KmerSizeEvaluator.java:44)
    at abra.KmerSizeEvaluator.identifyMinKmer(KmerSizeEvaluator.java:97)
    at abra.NativeAssembler.assembleContigs(NativeAssembler.java:140)
    at abra.ReAligner.processRegion(ReAligner.java:691)
    at abra.ReAlignerRunnable.go(ReAlignerRunnable.java:21)
    at abra.AbraRunnable.run(AbraRunnable.java:19)
    at java.lang.Thread.run(Thread.java:745)
mozack commented 10 years ago

Could you please compress and post the complete log file?

Alternatively, if you're able to share your data, I'd be happy to download and troubleshoot here.

mozack commented 9 years ago

Added additional target region validation on startup to hopefully generate more meaningful error messages.