mozack / abra

Assembly Based ReAligner
MIT License
70 stars 12 forks source link

Weird error ... #8

Closed joonlee3 closed 9 years ago

joonlee3 commented 9 years ago

Once I run ABRA, I always get the following error. Would you please look into it? Thanks. Loading native library from: /scratch/BREIGR0124_VAxg_01_1408120760/libAbra.so Loading reference map: /site/ne/home/wings/ref_data/reference_genome/hg19/chrUn_included/ucsc.hg19.fasta Done loading ref map. Elapsed secs: 112 Fri Aug 15 12:41:14 EDT 2014 : Reading Input SAM Header and identifying read length Fri Aug 15 12:41:14 EDT 2014 : Identifying header and determining read length Min insert length: 0 Max insert length: 240721460 Fri Aug 15 12:42:47 EDT 2014 : Max read length is: 100 Fri Aug 15 12:42:47 EDT 2014 : Min contig length: 101 Fri Aug 15 12:42:47 EDT 2014 : Read length: 100 Fri Aug 15 12:42:47 EDT 2014 : Loading target regions Exception in thread "main" java.lang.NumberFormatException: For input string: "+" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:484) at java.lang.Integer.parseInt(Integer.java:527) at abra.RegionLoader.load(RegionLoader.java:42) at abra.ReAligner.getRegions(ReAligner.java:784) at abra.ReAligner.loadRegions(ReAligner.java:794) at abra.ReAligner.reAlign(ReAligner.java:122) at abra.ReAligner.run(ReAligner.java:1282) at abra.Abra.main(Abra.java:12)

My Command: java -Xmx16g -jar ${ABRA_JAR} \ --in ${sample_id}.all.sorted.dedup.bam \ --out ${sample_id}.all.sorted.dedup.realigned.bam \ --ref ${reference_genome} \ --targets ${!target_bed_file_path} \ --threads 4 --mad 20000 --mbq 27 \ --working ${temp_dir}

mozack commented 9 years ago

ABRA is looking for an optional kmer size in the fourth column of the targets file. Please create a bed file with only the first 3 columns. I'll put handling this more elegantly on the todo list.

Also, the --mbq param is a positional sum of base qualities. Using a value of 27 is likely to generate a lot of noise during assembly. I'd also recommend using a much smaller value for --mad.

joonlee3 commented 9 years ago

Thank you so much for your reply and suggestions.

I thought --mbq is a single base quality score threshold. Do you have any other suggestions regarding parameter setting?

Many thanks, Joon

mozack commented 9 years ago

Optimal settings will ultimately depend on your data. I've recently changed the defaults for --mad and --mbq to 150 and 60 respectively and those should be a good starting point. That hasn't been released yet though. If you're dealing with much lower coverage (say 15X) and wish to detect lower frequency somatic variation, you might experiment with throttling mbq back down to the 40 range.

mozack commented 9 years ago

..... and if you're dealing with very high depth, you may wish to increase mnf and mbq to prune the assembly graph more aggressively.