haansi / mitolib

Apache License 2.0
2 stars 0 forks source link

haplochecker `ArrayIndexOutOfBoundsException` with test data #9

Open williamrowell opened 6 years ago

williamrowell commented 6 years ago

I tried haplochecker with my own alignments (long PacBio amplicons) and got the following error. To test whether it due to our data, I downloaded the HG96 test data, only to get the same error. I don't get this error with your HG01500 test data. Do you have any idea what could be causing this problem?

Error with BAM file java.lang.ArrayIndexOutOfBoundsException: 16574 at genepi.mitolib.bam.BAMReader.build(BAMReader.java:392) at genepi.mitolib.contChecker.HaploChecker.build(HaploChecker.java:91) at genepi.mitolib.contChecker.HaploChecker.run(HaploChecker.java:71) at genepi.base.Tool.start(Tool.java:193) at genepi.base.Toolbox.start(Toolbox.java:44) at genepi.mitolib.Tools.main(Tools.java:46)

Despite this error, I couldn't discern any missing output from the HG96 analysis.

Thanks!

haansi commented 6 years ago

The handling of reads that span the DLoop (..16569-1,2,3,...) yields to this exception, have to check the alignment/mapping information here (insertions, deletions,..). If this error appears once, this means 1 read is affected - the pipeline should run through and yield to a result. Keep you updated!

Alternatively you could also give https://github.com/seppinho/mutation-server a try

williamrowell commented 6 years ago

I hit an unrelated error on mutation-server, but I'm sure with a little work on my part, I can get these to work. I really appreciate anything you can do to help as well!

williamrowell commented 6 years ago

@haansi I'm wondering if this has to do with the more indel-error prone nature of long reads. Is it possible that in some edge cases cigar string insertion operations cause the position counter to extend beyond the end of the reference?

I do get results, but only for the first ~9kb of reference, after which I only get a few (<10) counts per base. It just so happens that I have reads mapping from roughly 9kb to the end of the reference.