mozack / abra2

ABRA2
MIT License
90 stars 9 forks source link

crash with java IndexOutOfBoundsException #18

Closed anso-sertier closed 6 years ago

anso-sertier commented 6 years ago

Hi,

I'm running ABRA2 on WGS data ([30,80]x samples) and I get this error multiple time but each time on a different region.

INFO Wed Mar 07 01:29:48 CET 2018 PROCESS_REGION_MSECS: 1_23599001_23599401 1 0 0 0 ERROR Wed Mar 07 01:29:48 CET 2018 Error parsing assembled contigs. Line: [>1_121484601_121485001_21] [...] java.lang.ArrayIndexOutOfBoundsException: 4 at abra.ScoredContig.convertAndFilter(ScoredContig.java:53) at abra.ReAligner.assemble(ReAligner.java:1096) at abra.ReAligner.processRegion(ReAligner.java:1262) at abra.ReAligner.processChromosomeChunk(ReAligner.java:342) at abra.ReAlignerRunnable.go(ReAlignerRunnable.java:21) at abra.AbraRunnable.run(AbraRunnable.java:20) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

I saw a previous issue describing this error but it was with version 2.9 but glk is now by default disabled.

I'm using ABRA2 version 2.14 compiled on CentOS 6.2 with jdk1.8.0_162, ABRA2 was launched with default options (except 8 threads) and normal and tumor bam were provided. Alignments were made with BWA aln on GRCh37 4/8 samples have already crashed with this error and the 4 remaining are running since 7 days (8 cpu + 30Go ram allocated) Can you also tell me if the running time seems correct for WGS or if I can speed it up (by removing centromere regions for exemple ?).

Thanks in advance,

Anne-Sophie

mozack commented 6 years ago

Those runtimes seem rather long. I would not expect centromeres to be the issue as low mapq reads should not contribute to contigs.

Could you please attach an entire log file containing the error?

anso-sertier commented 6 years ago

Hi Thanks for your answer. The file size is to big, you can download it from this link:

https://www.dropbox.com/s/0mi0w6ojvbp9h44/P03257.oe?dl=0

Thanks in advance,

Anne-Sophie

Le 14 mars 2018 à 13:58, Lisle Mose notifications@github.com<mailto:notifications@github.com> a écrit :

Those runtimes seem rather long. I would not expect centromeres to be the issue as low mapq reads should not contribute to contigs.

Could you please attach an entire log file containing the error?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/mozack/abra2/issues/18#issuecomment-373011486, or mute the threadhttps://github.com/notifications/unsubscribe-auth/APFcjvZM-mn_TIaGGEfPEdtJpD4UU3Tlks5teRPtgaJpZM4SpH-W.


Anne-Sophie Sertier, PhD Bioinformatician at Synergie Lyon Cancer foundation Centre Léon Bérard, Cheney C 28, rue Laennec 69373 LYON Cedex 08 +33 4 26 55 67 71 anne-sophie.sertier@lyon.unicancer.frmailto:anne-sophie.sertier@lyon.unicancer.fr

mozack commented 6 years ago

Unfortunately, I don't see a smoking gun cause of the Exception. If you're able to share a BAM snippet that reproduces the problem, I'd be happy to take a look.

Regarding the long runtimes, it appears that an inordinate number of assemblies are being triggered. If you have adapter contamination, you may wish to trim first. Otherwise, you may wish to try running with the --sa option to disable the full blown assembly.

anso-sertier commented 6 years ago

Thanks a lot for your answer. For the running time aspect, I have highly rearranged tumors. Only one sample tested ended (within 12 days) and it is the less rearranged one. I have between 1 and 6 Million raw positions with soft-clipped reads in tumors and less than 2 millions in paired normal tumors. This can perhaps explain the inordinate number of assemblies. QC do not show any adapter contamination. I'm now testing with --sa option. However I was wondering how this option affect ABRA process as ABRA do assembly. Can you give me some clues about this option ? Thanks a lot again

mozack commented 6 years ago

Unlike the original ABRA, ABRA2 does not rely exclusively on assembly. In our testing we see good results even with assembly disabled. Sensitivity for longer inserts will be impacted, but the overall results should still be good.

I have plans to investigate making the assembly triggers less frequent without negatively impacting sensitivity, however it may take me some time to get to this. In the meantime, the --sa option should be a reasonable workaround. Please let me know if you continue to run into problems.

nalcala commented 6 years ago

Hi,

I am also having occasional crashes with a "java.lang.ArrayIndexOutOfBoundsException", on 13 out of 210 RNA-seq BAM files on which I launched ABRA2.

The command line I used is: java -Xmx40g -jar abra2-2.14.jar --in XX.bam --out "XX_abra.bam" --ref ref_genome.fa --tmpdir . --threads 4 --index --junctions STAR.XX.SJ.out.tab --gtf ref_annot.gtf --sua --dist 500000

The crash is:

ERROR   Fri Apr 20 00:01:56 CEST 2018   Read buffer: [
[...]
]
java.lang.ArrayIndexOutOfBoundsException: 4
    at abra.ScoredContig.convertAndFilter(ScoredContig.java:53)
    at abra.ReAligner.assemble(ReAligner.java:1096)
    at abra.ReAligner.processRegion(ReAligner.java:1262)
    at abra.ReAligner.processChromosomeChunk(ReAligner.java:342)
    at abra.ReAlignerRunnable.go(ReAlignerRunnable.java:21)
    at abra.AbraRunnable.run(AbraRunnable.java:20)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)

Here is the complete log: XX_abra.log.zip

Could it be a problem with input BAM files? I will try re-aligning them just in case (I used reads trimming with cutadapt, mapping with STAR 2 pass, and sorting with sambamba).

Thanks!

nalcala commented 6 years ago

In both our cases, it looks like what is causing the error is that there is a contig assembled by ABRA2 that was not stored properly. Instead of having a contigString in the form >chr1_59247401_59247801_1_score AACAG...

it looks like >chr1_59247401_59 (The first line is cut; in @anso-sertier 's case, it is "1_121484601_121485001_21" which is cut just before the score; in my case it is even weirder because it is cut within the region name)

Thus ABRA function convertAndFilter cannot parse it correctly, and instead of having the score as element 4 of the line starting with '>', it gets nothing and returns an ArrayIndexOutOfBoundsException for element 4.

I guess a quick and dirty fix would be to have an option "ignore_parsing_errors" that would just ignore such weird contigs (in my case, it is contig 173 of the region that causes the crash, the previous ones were fine) and just put something in the log. It seems this kind of event is pretty rare so this should not throw away a lot of data.

I am trying to get a small reproducible example but so far cutting the BAM around the region leading to the error has actually avoided the error entirely...

mozack commented 6 years ago

Thanks for the input. I'll look into adding this option shortly.

nalcala commented 6 years ago

Thanks!

A more permanent solution would be great, but I could not get a reproducible example to help you... Actually, the error occurs on a different chromosome at each attempt, so the whole randomness of it makes the search for a small reproducible example illusory. I imagine that these errors have to do with something going wrong during the contig_str build, possibly here https://github.com/mozack/abra2/blob/43a9104db3407e2391f277085f43315bba252e4e/src/main/c/assembler.cpp#L858 Judging that it is sort of random, could it have to do with the memory usage?

As an alternative workaround, I have split the BAM file into chromosomes and run ABRA2 separately on each one of them, relaunching the ones that fail. I saw on a closed issue that you said it should be fine because the ABRA2's parallelization is at the megabase scale, so different chromosomes should be treated independently anyway. Am I assuming correctly? What happens in case of a read split between 2 chromosomes (e.g., due to a rearrangement)? Is this different in the case of RNA-seq (although STAR junctions seem to be within-chromosome)?

mozack commented 6 years ago

Option--ignore-bad-assembly has been introduced in release 2.15.

Note that I have not tested this as I have not recently encountered this issue. Feedback is appreciated.

Judging that it is sort of random, could it have to do with the memory usage?

This is entirely possible and I will be looking into it.

I saw on a closed issue that you said it should be fine because the ABRA2's parallelization is at the megabase scale, so different chromosomes should be treated independently anyway. Am I assuming correctly?

Yes, this is correct.

What happens in case of a read split between 2 chromosomes (e.g., due to a rearrangement)?

ABRA2 does not currently attempt to deal with structural variants. It is possible though, that the non-clipped portion of the read may be realigned if there are additional variants near the breakpoint.

Is this different in the case of RNA-seq (although STAR junctions seem to be within-chromosome)?

No

mozack commented 6 years ago

I have data that reproduces this problem now and it is clearly memory corruption. I am testing a fix and hope to have a new release including the fix available in the next few days. Thanks for your patience.

mfoll commented 6 years ago

👍

mozack commented 6 years ago

Release 2.17 should resolve this. Please let me know if you continue to see issues.

mozack commented 6 years ago

Closing this. Please feel free to re-open if the problem happens again.

heqisun commented 4 years ago

Hi @mozack , I encountered the same memory corruption problem even with the latest version (v2.22).

mozack commented 4 years ago

Please share some details about your failed run.

i.e. what is the command line you are running? what kind of data are you running on? Does the problem happen consistently for you across samples? Are you running with gkl enabled? and any other details particular to your run and compute environment.

heqisun commented 4 years ago

@mozack I run it with snakemake in conda environment and on HPC. The command line is: abra2 --in {input} --out {output} --ref {params.ref} --threads {threads} --tmpdir {params.dir} > abra_{type}.log Yes, the problem happens to all my WGS samples every time I run it. But it's fine with small samples.

mozack commented 4 years ago

Please zip and attach a log file.

heqisun commented 4 years ago

The abra.log file is actually empty.

heqisun commented 4 years ago

13512281.out.gz @mozack This is the stdout file I got

mozack commented 4 years ago

How did you acquire this version of abra?

heqisun commented 4 years ago

I acquired it though conda install

heqisun commented 4 years ago

conda install -c genomedk abra2

heqisun commented 4 years ago

Can I do realignment for each chromosome separately? If this could be a way to solve the memory problem. @mozack

mozack commented 4 years ago

This isn't related to the original issue. The default java heap size used in Conda is too small for your runs and you will need to allocate more RAM. I have not used the Conda installation myself (someone else set this up), but it looks like you can specify more RAM by setting the JAVA_TOOL_OPTIONS environment variable.

i.e. export JAVA_TOOL_OPTIONS="-Xmx32G"

You may need to experiment to determine the optimal amount of RAM for your samples.

If you continue to have trouble, please open a distinct issue.

heqisun commented 4 years ago

Thank you very much.