lh3 / miniasm

Ultrafast de novo assembly for long noisy reads (though having no consensus step)
MIT License
299 stars 68 forks source link

P6C4 devnet: Assertion error #2

Closed lexnederbragt closed 8 years ago

lexnederbragt commented 8 years ago

E coli data from https://github.com/PacificBiosciences/DevNet/wiki/E.-coli-Bacterial-Assembly caused minasm to crash for me.

wget https://s3.amazonaws.com/files.pacb.com/datasets/secondary-analysis/e-coli-k12-P6C4/p6c4_ecoli_RSII_DDR2_with_15kb_cut_E01_1.tar.gz Uncompress Convert to fastq using smrtanalysis 2.3.0:

ls *.bax.h5 >input.fofn
pls2fasta input.fofn -trimByRegion -minSubreadLength 50 -fastq -minReadScore 750 m141013_011508_sherri_c100709962550000001823135904221533_s1_p0.filtered_subreads.fastq

Convert to fasta using an in- house perl script

used minimap@c137b17 and miniasm@24ddd20 compiled with gcc 5.2.0

minimap/minimap -Sw5 -L100 -m0 -t8 P6C4.fasta P6C4.fasta | gzip -1 >P6C4.paf.gz
miniasm/miniasm -f P6C4.fasta P6C4.paf.gz >P6C4.gfa

miniasm output:

[M::main] ===> Step 1: reading read mappings <===
[M::ma_hit_read::34.899*0.98] read 15801032 hits; stored 11922618 hits and 66110 sequences (690111467 bp)
[M::main] ===> Step 2: 1-pass (crude) read selection <===
[M::ma_hit_sub::37.896*0.98] 64162 query sequences remain after sub
[M::ma_hit_cut::38.285*0.98] 11520836 hits remain after cut
[M::ma_hit_flt::38.734*0.98] 10989741 hits remain after filtering; crude coverage after filtering: 116.71
[M::main] ===> Step 3: 2-pass (fine) read selection <===
[M::ma_hit_sub::39.762*0.98] 61410 query sequences remain after sub
[M::ma_hit_cut::40.115*0.98] 10727924 hits remain after cut
[M::ma_hit_contained::40.496*0.98] 1581 sequences and 22558 hits remain after containment removal
[M::main] ===> Step 4: graph cleaning <===
[M::ma_sg_gen] read 21900 arcs
[M::asg_arc_del_trans] transitively reduced 17740 arcs
[M::asg_arc_del_multi] removed 0 multi-arcs
[M::asg_arc_del_asymm] removed 630 asymmetric arcs
[M::asg_pop_bubble] popped 107 bubbles and trimmed 13 tips
[M::asg_cut_short_utg] dropped [0,0,0] short unitigs
[M::asg_arc_del_multi] removed 0 multi-arcs
[M::asg_arc_del_asymm] removed 22 asymmetric arcs
[M::asg_arc_del_short] removed 52 short overlaps
[M::asg_pop_bubble] popped 17 bubbles and trimmed 7 tips
[M::asg_cut_short_utg] dropped [6,0,0] short unitigs
[M::asg_cut_short_utg] dropped [1,0,0] short unitigs
[M::asg_cut_short_utg] dropped [0,0,0] short unitigs
[M::asg_pop_bubble] popped 0 bubbles and trimmed 0 tips
[M::main] ===> Step 5: generating unitig graph <===
miniasm: asm.c:265: ma_ug_seq: Assertion `sub[id].e - sub[id].s < ks->seq.l' failed.
lh3 commented 8 years ago

Oh, that assertion should be: sub[id].e - sub[id].s <= ks->seq.l. <=, not <.

lexnederbragt commented 8 years ago

Tested at f8cf505 and it now finishes!