lh3 / miniasm

Ultrafast de novo assembly for long noisy reads (though having no consensus step)
MIT License
299 stars 68 forks source link

assembling canu corrected reads #11

Closed laurencowley closed 8 years ago

laurencowley commented 8 years ago

Hi, So I am trying to to assemble ONT reads that have been corrected with canu. But I am getting a strange error: Laurens-MacBook-Pro:Ecoli_644 laurencowley$ ../git_repo/miniasm/miniasm -f source_canu_lowcov/Ecoli644.correctedReads.fasta.gz Ecoli644_correctedminimap.paf.gz > Ecoli644_miniasm.gfa [M::main] ===> Step 1: reading read mappings <=== [M::ma_hit_read::1.850_1.00] read 1687161 hits; stored 1257327 hits and 14342 sequences (124608869 bp) [M::main] ===> Step 2: 1-pass (crude) read selection <=== [M::ma_hit_sub::2.042_1.00] 14340 query sequences remain after sub [M::ma_hit_cut::2.068_1.00] 1257303 hits remain after cut [M::ma_hit_flt::2.096_0.99] 1112831 hits remain after filtering; crude coverage after filtering: 54.73 [M::main] ===> Step 3: 2-pass (fine) read selection <=== [M::ma_hit_sub::2.153_0.99] 14335 query sequences remain after sub [M::ma_hit_cut::2.179_0.99] 1112203 hits remain after cut [M::ma_hit_chimeric::2.207_0.99] identified 143 chimeric reads [M::ma_hit_contained::2.236_0.99] 749 sequences and 10538 hits remain after containment removal [M::main] ===> Step 4: graph cleaning <=== [M::ma_sg_gen] read 10294 arcs [M::main] ===> Step 4.1: transitive reduction <=== [M::asg_arc_del_trans] transitively reduced 3498 arcs [M::asg_arc_del_multi] removed 1652 multi-arcs [M::asg_arc_del_asymm] removed 0 asymmetric arcs [M::main] ===> Step 4.2: initial tip cutting and bubble popping <=== [M::asg_cut_tip] cut 8 tips [M::asg_pop_bubble] popped 11 bubbles and trimmed 0 tips [M::main] ===> Step 4.3: cutting short overlaps (3 rounds in total) <=== [M::asg_arc_del_multi] removed 0 multi-arcs [M::asg_arc_del_asymm] removed 345 asymmetric arcs [M::asg_arc_del_short] removed 1725 short overlaps [M::asg_cut_tip] cut 8 tips [M::asg_pop_bubble] popped 127 bubbles and trimmed 0 tips [M::asg_arc_del_multi] removed 0 multi-arcs [M::asg_arc_del_asymm] removed 29 asymmetric arcs [M::asg_arc_del_short] removed 47 short overlaps [M::asg_cut_tip] cut 1 tips [M::asg_pop_bubble] popped 15 bubbles and trimmed 0 tips [M::asg_arc_del_multi] removed 0 multi-arcs [M::asg_arc_del_asymm] removed 9 asymmetric arcs [M::asg_arc_del_short] removed 11 short overlaps [M::asg_cut_tip] cut 1 tips [M::asg_pop_bubble] popped 4 bubbles and trimmed 0 tips [M::main] ===> Step 4.4: removing short internal sequences and bi-loops <=== [M::asg_cut_internal] cut 0 internal sequences [M::asg_cut_biloop] cut 0 small bi-loops [M::asg_cut_tip] cut 0 tips [M::asg_pop_bubble] popped 0 bubbles and trimmed 0 tips [M::main] ===> Step 4.5: aggressively cutting short overlaps <=== [M::asg_arc_del_multi] removed 0 multi-arcs [M::asg_arc_del_asymm] removed 1 asymmetric arcs [M::asg_arc_del_short] removed 1 short overlaps [M::asg_cut_tip] cut 0 tips [M::asg_pop_bubble] popped 0 bubbles and trimmed 0 tips [M::main] ===> Step 5: generating unitigs <=== Assertion failed: (sub[id].e - sub[id].s <= ks->seq.l), function ma_ug_seq, file asm.c, line 267. Abort trap: 6 Do you know why this might be?

lh3 commented 8 years ago

Could you check if the input contains duplicated read names?

laurencowley commented 8 years ago

I don't think so the reads are in fasta format with read names in the format:

143b9016-9b2f-4a9f-b3e7-a87378b5afa2_Basecall_2D_2d NBCOL1105_Ecoli_644_3618_1_ch32_file16_strand

lh3 commented 8 years ago

Could you run the following and see if it gives any output?

zcat input.fa.gz | perl -ne 'print "$1\n" if />(\S+)/' | sort | uniq -d
laurencowley commented 8 years ago

ok it has outputted the read names in order and there are no repeats

lh3 commented 8 years ago

That command line would have no output if there were no duplicated read names. -d means to output duplicated strings only.

laurencowley commented 8 years ago

oh I see!! right I will sort out the file and try miniasm again, sorry!

lh3 commented 8 years ago

Never mind. I should have let minimap/miniasm check input name. This is actually a common error.