mozack / abra2

ABRA2
MIT License
90 stars 9 forks source link

java.lang.IllegalArgumentException: Contig too long #39

Closed warthmann closed 4 years ago

warthmann commented 4 years ago

Hello, I am using abra2 as intel realigner in my workflow after alignment with bwa and before variant calling with freebayes. It works quite nicely, until now. In my most recent run, the workflow crashed at the abra2 step. Apparently because abra2 hit a contig length limit. The tail of my error log is attached. MAX_CONTIG_LEN seems hardcoded at 2000-1, while the contig in question is 2035. I am now wondering what my options are to have my workflow run. Can I override this somehow? or skip it? Why is there a length limit in the first place? Thanks a lot! This is somewhat urgent.

warthmann commented 4 years ago

I cannot see the attached file, hence here the error in plain text:

INFO Tue Dec 10 18:42:15 AEDT 2019 PROCESS_REGION_MSECS: Chr08_53568000_53568400 8 0 5 0 INFO Tue Dec 10 18:42:15 AEDT 2019 PROCESS_REGION_MSECS: Chr05_22678400_22678800 11 0 5 0 INFO Tue Dec 10 18:42:15 AEDT 2019 PROCESS_REGION_MSECS: Chr05_69297000_69297400 911 16 23 0 WARNING Tue Dec 10 18:42:15 AEDT 2019 In Region: Chr07_63000800_63001200, contig too long: [ATTGCATAAAAACGAGAACTGGAAGAGAACTACATATGGTCTGAGACTCACTTCCTTGAAAAGAGGAAGACTAGTTATGTAGGCAGTTACCGTGAAAGAAATGTATCTTTATCGGTTGCTCGACCAAAGCCTGTCAATAAGACAGATCCGAAACAATACCGGCTGTTCAATTTTAGAAAGGCCCATCTCCTGATTGTCTCAAATCGAATCAAACTCTTTAGCAATTGGTCTGACAGAACGGATGGTAAGGTGGGGGACCCATACCTTTATTCTATTCCCGTGCTTAGAGCGAGTGAATGAAAAGCTACGCCTGCTGCTTGTGTTGATTGAAAAGAAACCTGTCGGGAGACAGATCGATGTAGGCGCTCGTCCACTGTTGGCGGTGCTACACCATAGATTTCTAGGAGCAAGAAGCCGTAAAGAAGGTGAGGTGAAAGCCACACCCAAGAAGATAGGGAAGATCGGTCCAAAGAGAGTCTTTTTATGTTTCCTAGCTAGTTCAGAGAAAGAGGCATCTAAGGCAAAAGTCTTCACTCACTATAGTAGAATCTCTTTTCCGGGTGGAAGCTCGGTAAAAAGAGCTAAGTTCGTTGAAGCAATTCTCTTTTTTTTATTCCTACTACGTCCTCCAAAAAGAAAGATGCAAATGCTCTGCTGTCACCTCTCTTCTCTGCCTTTCTAGTCTAACCAAGCAAGCCCCCTGACTTTTACTATTGGGCTTTCGAATCTGACACGGCTCGAACACCCACCTTGTATAGCCCTTTCTCCCCACTCTTTTGCTGCCAACTCCCCTCTCCTCTTCTTTCAATAGAAGAAAGACGAGCCAATCTGCATAGAGTCATCTATAGAAAGACAAAACTTTACTGCTTGCTTTGGCGGTATATCTCCTAGAACTTTCCTAGGCCTTCCTTGTTTCGATCCCATCTTCTCATCTTTTTGGTGAGCCAGCCATTGTGTGTGTGTGTCCCAACTGTTCCTTCGCCTCGTAATGGGCTTTAATGCACTGATGCTCGTATGAGAATAGTCCGTGCCCCGGGAAGAAAGCTTATGTAAAGAGAAGAAAGCGGGGAAGAAAGCTTATGTATGTATTCGGATTCAGCTGTGAACACGAGGATGGGTACCCTATTATTGGTGTTGGTGGTTCCTCTAAAGCTACAGGAGCTAATAAAAGATCCGGGTTCGTTGGAAGAACAGAAAGAAAGGGTTCCAGCTTTTATCGAACGAGTAAGGGATCTTTGGATCCTTTTTTTTCAAGCTTAAAGGCGAGTAATCAATGTTGCTTCTTTTCCTTGAATATGGATATCTATGAGTTTTCGCACATCCTTTGTTTTGTGTGAAATGGCCTTGACTAGTTGGATTGAGTATTAGGAGAGCAATGGACCTTTATCCGAAGGAGCACTTCAAGGATTGAATCTCCCACTTAATAAGATTGCGTTGGAATGGAAAGCTAAAGCGAAGTGAATAGGAAGTGTATGATTCCCTGGGGATTGGGAGGAGTCCTAGAAAAAGAGAATGAGAGAGCGGACCATTTAAATAGGTATAAACTAATGCATTCATTTTATCACTTCTCTATGGTAATCTCGTCTCGTTGTAGTCCAATTTTGTTTTGATGTGTCAATCGGCGATGGAAGGATTTGGCTTCAAGCCAAGACAAGGTACTACGGCTGATAATGCACTCGAGAAGTTTAGGTTTACGAAACTACATTTCCCTTCCCGGGCATTAATCGCTTGAATAAGCACGTAGCAATAGCTGATTCTTTTCAATGGCTTGGTAAAGTTTAGTCAAATGAGTAAAAGAGGATAGGCTAAGGAGGGTAGAAGGCACGAAGTGATTGAGGCAACACTTTCGATTTAATAGCTCTTCCTCATCTGACATACTACCTAACCTTGTATACTGAGATTACGAAGCAGAAGCAGCCTTATCTGACAGACTTCCTTATCTCTGGCTTAACTGCATCTGTCCTTCATTTCACTCTTTCGTACTTCATTTTCAATTGCACTGAAGTTCACTAACAACTCTAAGCATACTAA] INFO Tue Dec 10 18:42:15 AEDT 2019 PROCESS_REGION_MSECS: Chr05_22678600_22679000 14 0 5 0 INFO Tue Dec 10 18:42:15 AEDT 2019 PROCESS_REGION_MSECS: Chr05_69297200_69297600 42 0 24 0 java.lang.IllegalArgumentException: Contig too long at abra.NativeSemiGlobalAligner.align(NativeSemiGlobalAligner.java:27) at abra.ContigAligner.align(ContigAligner.java:73) at abra.ReAligner.alignContig(ReAligner.java:995) at abra.ReAligner.assemble(ReAligner.java:1134) at abra.ReAligner.processRegion(ReAligner.java:1293) at abra.ReAligner.processChromosomeChunk(ReAligner.java:361) at abra.ReAlignerRunnable.go(ReAlignerRunnable.java:21) at abra.AbraRunnable.run(AbraRunnable.java:20) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

mozack commented 4 years ago

ABRA2 uses a buffer of size 2k per thread for contig alignment. Given that regions of 400 bases are processed by default, this seemed like a reasonable default and this is the first I've heard of the 2k threshold being exceeded. Rather than allowing the buffer to be overrun, which might result in undefined behavior, we check up front.

If you're able to share a BAM snippet that reproduces the problem, I will take a closer look.

If you'd like a quick workaround, you might consider dropping the region that causes the problem for this particular run.

warthmann commented 4 years ago

Dear Lisle Mose,

thank you for looking into this. abra2 has crashed when analysing this chunk: Chr07:63000800-63001200

I have extracted the respective region and a bit up and down like so. samtools view -h output/alignments/sets/bwa~Sbicolor_454_v3.0.1~all_samples.bam Chr07:63000000-63002000 > problematic_region_for_abra2.sam bgzip problematic_region_for_abra2.sam Please let me know if you need more. Also would be good to know how the 2kb relate to the 400 bp.

best Norman


Norman WARTHMANN | Molecular Geneticist | Plant Breeding and Genetics Laboratory | Joint FAO/IAEA Division of Nuclear Techniques in Food and Agriculture | Department of Nuclear Sciences and Applications | International Atomic Energy Agency | IAEA Laboratories, 2444 Seibersdorf, Austria | Email: N.Warthmann@iaea.org | T: (+43 1) 2600-28260 | Follow us on www.iaea.org

On 12 Dec 2019, at 18:48 , Lisle Mose notifications@github.com wrote:

ABRA2 uses a buffer of size 2k per thread for contig alignment. Given that regions of 400 bases are processed by default, this seemed like a reasonable default and this is the first I've heard of the 2k threshold being exceeded. Rather than allowing the buffer to be overrun, which might result in undefined behavior, we check up front.

If you're able to share a BAM snippet that reproduces the problem, I will take a closer look.

If you'd like a quick workaround, you might consider dropping the region that causes the problem for this particular run.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

mozack commented 4 years ago

Fixed in v2.23