lh3 / miniasm

Ultrafast de novo assembly for long noisy reads (though having no consensus step)
MIT License
299 stars 68 forks source link

Assertion error #6

Closed levinas closed 8 years ago

levinas commented 8 years ago

Read files from: http://www.ncbi.nlm.nih.gov/sra/SRX1433261%5Baccn%5D

Converted to fasta using seqtk seq -A.

$ cat SRR2917853_11.fasta SRR2917853_13.fasta SRR2917853_15.fasta SRR2917853_17.fasta SRR2917853_19.fasta SRR2917853_1.fasta SRR2917853_21.fasta SRR2917853_23.fasta SRR2917853_25.fasta SRR2917853_27.fasta SRR2917853_29.fasta SRR2917853_2.fasta SRR2917853_31.fasta SRR2917853_3.fasta SRR2917853_4.fasta SRR2917853_5.fasta SRR2917853_7.fasta SRR2917853_9.fasta | gzip -1 >reads.fa.gz

$ minimap -Sw5 -L100 -m0 -t4 reads.fa.gz reads.fa.gz | gzip -1 > reads.paf.gz

$ miniasm -f reads.fa.gz reads.paf.gz > reads.gfa

[M::main] ===> Step 1: reading read mappings <===
[M::ma_hit_read::26.445*1.00] read 19770006 hits; stored 33249361 hits and 69144 sequences (1261322966 bp)
[M::main] ===> Step 2: 1-pass (crude) read selection <===
[M::ma_hit_sub::32.234*1.00] 61791 query sequences remain after sub
[M::ma_hit_cut::32.857*1.00] 28346032 hits remain after cut
[M::ma_hit_flt::33.466*1.00] 19148813 hits remain after filtering; crude coverage after filtering: 211.85
[M::main] ===> Step 3: 2-pass (fine) read selection <===
[M::ma_hit_sub::34.669*1.00] 61081 query sequences remain after sub
[M::ma_hit_cut::35.021*1.00] 18761267 hits remain after cut
[M::ma_hit_contained::35.507*1.00] 3085 sequences and 79066 hits remain after containment removal
[M::main] ===> Step 4: graph cleaning <===
[M::ma_sg_gen] read 64665 arcs
[M::main] ===> Step 4.1: transitive reduction <===
[M::asg_arc_del_trans] transitively reduced 35554 arcs
[M::asg_arc_del_multi] removed 4 multi-arcs
[M::asg_arc_del_asymm] removed 2273 asymmetric arcs
[M::main] ===> Step 4.2: initial tip cutting and bubble popping <===
[M::asg_cut_tip] cut 310 tips
[M::asg_pop_bubble] popped 0 bubbles and trimmed 0 tips
[M::main] ===> Step 4.3: cutting short overlaps (3 rounds in total) <===
[M::asg_arc_del_multi] removed 0 multi-arcs
[M::asg_arc_del_asymm] removed 3034 asymmetric arcs
[M::asg_arc_del_short] removed 8080 short overlaps
[M::asg_cut_tip] cut 432 tips
[M::asg_pop_bubble] popped 6 bubbles and trimmed 3 tips
[M::asg_arc_del_multi] removed 0 multi-arcs
[M::asg_arc_del_asymm] removed 975 asymmetric arcs
[M::asg_arc_del_short] removed 1177 short overlaps
[M::asg_cut_tip] cut 233 tips
[M::asg_pop_bubble] popped 28 bubbles and trimmed 4 tips
[M::asg_arc_del_multi] removed 0 multi-arcs
[M::asg_arc_del_asymm] removed 814 asymmetric arcs
[M::asg_arc_del_short] removed 908 short overlaps
[M::asg_cut_tip] cut 272 tips
[M::asg_pop_bubble] popped 51 bubbles and trimmed 3 tips
[M::main] ===> Step 4.4: removing short internal sequences and bi-loops <===
[M::asg_cut_internal] cut 215 internal sequences
[M::asg_cut_biloop] cut 384 small bi-loops
[M::asg_cut_tip] cut 71 tips
[M::asg_pop_bubble] popped 21 bubbles and trimmed 4 tips
[M::main] ===> Step 4.5: aggressively cutting short overlaps <===
[M::asg_arc_del_multi] removed 0 multi-arcs
[M::asg_arc_del_asymm] removed 269 asymmetric arcs
[M::asg_arc_del_short] removed 317 short overlaps
[M::asg_cut_tip] cut 130 tips
[M::asg_pop_bubble] popped 43 bubbles and trimmed 14 tips
[M::main] ===> Step 5: generating unitigs <===
miniasm: asm.c:267: ma_ug_seq: Assertion `sub[id].e - sub[id].s <= ks->seq.l' failed.

Used minimap 7a2e4df and miniasm c0a8e44.

lh3 commented 8 years ago

That is because in the input, the read names are not unique.

lh3 commented 8 years ago

Btw, it is hard for minimap to see this input error as it needs to build a hash table for all read names, which takes space. In addition, I have assembled the largest file SRR2917853_1.fastq.gz. I got 3 >100k unitigs. I guess this run is not trimmed because the reads are too long. Untrimmed reads may affect the quality of the assembly.

levinas commented 8 years ago

Thanks.