Closed lexnederbragt closed 8 years ago
E coli data from https://github.com/PacificBiosciences/DevNet/wiki/E.-coli-Bacterial-Assembly caused minasm to crash for me.
wget https://s3.amazonaws.com/files.pacb.com/datasets/secondary-analysis/e-coli-k12-P6C4/p6c4_ecoli_RSII_DDR2_with_15kb_cut_E01_1.tar.gz Uncompress Convert to fastq using smrtanalysis 2.3.0:
wget https://s3.amazonaws.com/files.pacb.com/datasets/secondary-analysis/e-coli-k12-P6C4/p6c4_ecoli_RSII_DDR2_with_15kb_cut_E01_1.tar.gz
ls *.bax.h5 >input.fofn pls2fasta input.fofn -trimByRegion -minSubreadLength 50 -fastq -minReadScore 750 m141013_011508_sherri_c100709962550000001823135904221533_s1_p0.filtered_subreads.fastq
Convert to fasta using an in- house perl script
used minimap@c137b17 and miniasm@24ddd20 compiled with gcc 5.2.0
minimap/minimap -Sw5 -L100 -m0 -t8 P6C4.fasta P6C4.fasta | gzip -1 >P6C4.paf.gz miniasm/miniasm -f P6C4.fasta P6C4.paf.gz >P6C4.gfa
miniasm output:
[M::main] ===> Step 1: reading read mappings <=== [M::ma_hit_read::34.899*0.98] read 15801032 hits; stored 11922618 hits and 66110 sequences (690111467 bp) [M::main] ===> Step 2: 1-pass (crude) read selection <=== [M::ma_hit_sub::37.896*0.98] 64162 query sequences remain after sub [M::ma_hit_cut::38.285*0.98] 11520836 hits remain after cut [M::ma_hit_flt::38.734*0.98] 10989741 hits remain after filtering; crude coverage after filtering: 116.71 [M::main] ===> Step 3: 2-pass (fine) read selection <=== [M::ma_hit_sub::39.762*0.98] 61410 query sequences remain after sub [M::ma_hit_cut::40.115*0.98] 10727924 hits remain after cut [M::ma_hit_contained::40.496*0.98] 1581 sequences and 22558 hits remain after containment removal [M::main] ===> Step 4: graph cleaning <=== [M::ma_sg_gen] read 21900 arcs [M::asg_arc_del_trans] transitively reduced 17740 arcs [M::asg_arc_del_multi] removed 0 multi-arcs [M::asg_arc_del_asymm] removed 630 asymmetric arcs [M::asg_pop_bubble] popped 107 bubbles and trimmed 13 tips [M::asg_cut_short_utg] dropped [0,0,0] short unitigs [M::asg_arc_del_multi] removed 0 multi-arcs [M::asg_arc_del_asymm] removed 22 asymmetric arcs [M::asg_arc_del_short] removed 52 short overlaps [M::asg_pop_bubble] popped 17 bubbles and trimmed 7 tips [M::asg_cut_short_utg] dropped [6,0,0] short unitigs [M::asg_cut_short_utg] dropped [1,0,0] short unitigs [M::asg_cut_short_utg] dropped [0,0,0] short unitigs [M::asg_pop_bubble] popped 0 bubbles and trimmed 0 tips [M::main] ===> Step 5: generating unitig graph <=== miniasm: asm.c:265: ma_ug_seq: Assertion `sub[id].e - sub[id].s < ks->seq.l' failed.
Oh, that assertion should be: sub[id].e - sub[id].s <= ks->seq.l. <=, not <.
sub[id].e - sub[id].s <= ks->seq.l
<=
<
Tested at f8cf505 and it now finishes!
E coli data from https://github.com/PacificBiosciences/DevNet/wiki/E.-coli-Bacterial-Assembly caused minasm to crash for me.
wget https://s3.amazonaws.com/files.pacb.com/datasets/secondary-analysis/e-coli-k12-P6C4/p6c4_ecoli_RSII_DDR2_with_15kb_cut_E01_1.tar.gz
Uncompress Convert to fastq using smrtanalysis 2.3.0:Convert to fasta using an in- house perl script
used minimap@c137b17 and miniasm@24ddd20 compiled with gcc 5.2.0
miniasm output: