mozack / abra

Assembly Based ReAligner
MIT License
70 stars 12 forks source link

Error running on customs bed file #15

Closed rhshah closed 9 years ago

rhshah commented 9 years ago

Hi I am trying to run ABRA on Tumor Normal Pairs, but I am trying to create a custom bed file by running GATK's FindCoveredIntervals tools. The reason to do this is that we want to run ABRA also on offtarget regions, for better off target variant calling. While doing this I have got this error:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 4143 at abra.CompareToReference2.getBaseAsChar(CompareToReference2.java:368) at abra.CompareToReference2.getSequence(CompareToReference2.java:401) at abra.ReAligner.cleanAndOutputContigs(ReAligner.java:952) at abra.ReAligner.alignAndCleanContigs(ReAligner.java:523) at abra.ReAligner.reAlign(ReAligner.java:188) at abra.ReAligner.run(ReAligner.java:1306) at abra.Abra.main(Abra.java:12)

Will appreciate your insights on this error and how to avoid it.

mozack commented 9 years ago

Could you please send me your bed file for this sample along with a BAM header (or something that lists the reference sequences and lengths)?

rhshah commented 9 years ago

I am not sure how to attach text file here. I tried replying to your email but it failed. Let me know where should i send you those files.

mozack commented 9 years ago

Please send to: lmose at unc dot edu

mozack commented 9 years ago

Thanks. You've uncovered a bug in our handling of contigs mapping near the ends of chromosomes. I've committed a fix (lightly tested so far) to the head. Unfortunately, the head has some other recent commits that should undergo additional testing. I expect to have a release including this change available sometime next week.

If you need this urgently, applying this same commit to a previous release should work fine. I can help with that if needed.

Here's the change: https://github.com/mozack/abra/commit/b0a2e67d6f824da16ff7530d7aa86ff89b0610cb

For your specific test case, the problematic contig appears to be mapping near the end of chromosome MT.

Lastly, I noticed from your logs that you are passing in kmer values on the command line. Abra now can automatically calculate appropriate kmer sizes on a per region basis. We see improved results using this approach. Just omit the kmer param if you'd like to give it a try.

rhshah commented 9 years ago

Cool Thanks for your quick reply and appreciate the quick fix. I will wait for you to release the new code. Do you have a summary of improvements for your next release. Also do you know if we can make this code work for Amplicon Based datasets where they have fixed start and stop sites.

I know about the automatic size selection. I was testing this for out next release in the pipeline. But will make sure we test it without the k-mer values once you upload the new code.

JUST FYI: Also We like you thank you for this amazing tool, one of my summer high school student evaluated it last year and here is his poster: http://www.slideshare.net/rshah7/final-posterhopp

mozack commented 9 years ago

Wow, thanks for the feedback!

We don't typically use the fixed start/stop amplicon datasets you've described. If you have a test set that you are able to share I'd be happy to take a look. I am a bit skeptical though as the assembly generally works better with some read complexity across the variant. Are you using this amplicon method for discovery or for validation?

Will put together notes describing the changes in next week's release.

rhshah commented 9 years ago

Thanks for working on this. The amplicon data we are using that for discovery of know variation and we are missing some. I agree due to no read complexity it will be hard to do this. I will mail you the scrubbed data and we can go from there.

Thanks, Ronak

mozack commented 9 years ago

Sorry, but the forthcoming release is going to have to slide to next week.

rhshah commented 9 years ago

OK, thanks for keeping me in the loop. I am currently trying to gather scrubbed amplicon based data for testing, will update you once I have that.

mozack commented 9 years ago

The original bug reported should be resolved in v0.91. Please let me know if you run into any more problems.