lh3 / miniasm

Ultrafast de novo assembly for long noisy reads (though having no consensus step)
MIT License
299 stars 68 forks source link

miniasm: asm.c:267: ma_ug_seq: Assertion `sub[id].e - sub[id].s <= ks->seq.l' failed. #24

Closed mictadlo closed 7 years ago

mictadlo commented 7 years ago

Hi, I have got miniasm: asm.c:267: ma_ug_seq: Assertionsub[id].e - sub[id].s <= ks->seq.l' failed.`

[M::main] ===> Step 0: removing contained reads <===
[M::ma_hit_no_cont::2767.695*1.00] dropped 609653 contained reads
[M::main] ===> Step 1: reading read mappings <===
[M::ma_hit_read::7011.622*1.00] read 2185758957 hits; stored 2333379196 hits and 2773338 sequences (40113856176 bp)
[M::main] ===> Step 2: 1-pass (crude) read selection <===
[M::ma_hit_sub::7353.708*1.00] 2659876 query sequences remain after sub
[M::ma_hit_cut::7483.210*1.00] 2139917924 hits remain after cut
[M::ma_hit_flt::7547.623*1.00] 547748330 hits remain after filtering; crude coverage after filtering: 109.85
[M::main] ===> Step 3: 2-pass (fine) read selection <===
[M::ma_hit_sub::7562.296*1.00] 2639454 query sequences remain after sub
[M::ma_hit_cut::7590.590*1.00] 488640251 hits remain after cut
[M::ma_hit_contained::7619.274*1.00] 339002 sequences and 8459816 hits remain after containment removal
[M::main] ===> Step 4: graph cleaning <===
[M::ma_sg_gen] read 5136921 arcs
[M::main] ===> Step 4.1: transitive reduction <===
[M::asg_arc_del_trans] transitively reduced 2569347 arcs
[M::asg_arc_del_multi] removed 6952 multi-arcs
[M::asg_arc_del_asymm] removed 161996 asymmetric arcs
[M::main] ===> Step 4.2: initial tip cutting and bubble popping <===
[M::asg_cut_tip] cut 13178 tips
[M::asg_pop_bubble] popped 3970 bubbles and trimmed 84 tips
[M::main] ===> Step 4.3: cutting short overlaps (3 rounds in total) <===
[M::asg_arc_del_multi] removed 0 multi-arcs
[M::asg_arc_del_asymm] removed 194125 asymmetric arcs
[M::asg_arc_del_short] removed 1070031 short overlaps
[M::asg_cut_tip] cut 15410 tips
[M::asg_pop_bubble] popped 5473 bubbles and trimmed 415 tips
[M::asg_arc_del_multi] removed 0 multi-arcs
[M::asg_arc_del_asymm] removed 58017 asymmetric arcs
[M::asg_arc_del_short] removed 91873 short overlaps
[M::asg_cut_tip] cut 8879 tips
[M::asg_pop_bubble] popped 2763 bubbles and trimmed 337 tips
[M::asg_arc_del_multi] removed 0 multi-arcs
[M::asg_arc_del_asymm] removed 34260 asymmetric arcs
[M::asg_arc_del_short] removed 49438 short overlaps
[M::asg_cut_tip] cut 9207 tips
[M::asg_pop_bubble] popped 2694 bubbles and trimmed 555 tips
[M::main] ===> Step 4.4: removing short internal sequences and bi-loops <===
[M::asg_cut_internal] cut 8866 internal sequences
[M::asg_cut_biloop] cut 16632 small bi-loops
[M::asg_cut_tip] cut 919 tips
[M::asg_pop_bubble] popped 308 bubbles and trimmed 104 tips
[M::main] ===> Step 4.5: aggressively cutting short overlaps <===
[M::asg_arc_del_multi] removed 0 multi-arcs
[M::asg_arc_del_asymm] removed 15051 asymmetric arcs
[M::asg_arc_del_short] removed 19751 short overlaps
[M::asg_cut_tip] cut 3665 tips
[M::asg_pop_bubble] popped 1094 bubbles and trimmed 601 tips
[M::main] ===> Step 5: generating unitigs <===
miniasm: asm.c:267: ma_ug_seq: Assertion `sub[id].e - sub[id].s <= ks->seq.l' failed.
/var/spool/PBS/mom_priv/jobs/2106550.pbs.SC: line 23: 38560 Aborted                 miniasm -Rc2 -f /work/waterhouse_team/fruit/contamination-free-pacbio/RSnSeQ_110fruit-rm-clean-rmsmrtbell-gt-10k.fastq fruit-gt-10k/fruit.paf.gz > fruit-gt-10k/fruit.gfa
[M::mm_idx_gen::0.005*0.00] collected minimizers

What did I miss?

Thank you in advance.

Michal

RyanBio commented 7 years ago

You may have reads with same names or same part splited by space. Check it like this: zcat input.fa.gz | perl -ne 'print "$1\n" if />(\S+)/' | sort | uniq -d

mictadlo commented 7 years ago

Thank you, I used rename.sh in=input.fasta out=output.fasta prefix=test and it fixed the names problem

roblanf commented 6 years ago

For others who might encounter this issue too, I thought I'd just add here that rename.sh is a script in the BBtools package: https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/

mw55309 commented 3 years ago

So what is the likely issue when one encounters this error but you don't have duplicated IDs?

@ryanbio @mictadlo @roblanf

[M::main] ===> Step 5: generating unitigs <===
miniasm: asm.c:267: ma_ug_seq: Assertion `sub[id].e - sub[id].s <= ks->seq.l' failed.
/bin/bash: line 1: 13225 Aborted                 (core dumped) miniasm -f filtered/DC_concat.fq.gz overlaps/DC_concat.paf.gz > miniasm/DC_concat.gfa
lh3 commented 3 years ago

@mw55309 I don't know. I have only seen this assertion failure when there are duplicated read names.