lh3 / miniasm

Ultrafast de novo assembly for long noisy reads (though having no consensus step)
MIT License
303 stars 69 forks source link

Core Dump error while generating unitigs(Step 5) #12

Open athulmenon opened 8 years ago

athulmenon commented 8 years ago

Hi,

I am doing a denovo assembling a nanopore data of R7.3 chemistry. At first I converted the fast5 to fasta using poretools. After that I followed the pipeline, for the denovo assembly using minimap and miniasm.

The commands I ran was:

/opt/programs/minimap-master/minimap -Sw5 -L100 -m0 Ecoli_R73.fasta Ecoli_R73.fasta | gzip -1 > Ecoli_R73.paf.gz

/opt/programs/miniasm-master/miniasm -f Ecoli_R73.fasta Ecoli_R73.paf.gz > Ecoli_R73.gfa

but the error pops up when using miniasm and the error is :+1: [M::main] ===> Step 1: reading read mappings <=== [M::ma_hit_read::0.144*0.92] read 92809 hits; stored 127633 hits and 8123 sequences (59841178 bp) [M::main] ===> Step 2: 1-pass (crude) read selection <=== [M::ma_hit_sub::0.159*0.92] 5645 query sequences remain after sub [M::ma_hit_cut::0.164*0.92] 100054 hits remain after cut [M::ma_hit_flt::0.167*0.93] 93030 hits remain after filtering; crude coverage after filtering: 9.35 [M::main] ===> Step 3: 2-pass (fine) read selection <=== [M::ma_hit_sub::0.171*0.93] 5336 query sequences remain after sub [M::ma_hit_cut::0.175*0.93] 80419 hits remain after cut [M::ma_hit_chimeric::0.178*0.93] identified 1 chimeric reads [M::ma_hit_contained::0.181*0.93] 855 sequences and 5139 hits remain after containment removal [M::main] ===> Step 4: graph cleaning <=== [M::ma_sg_gen] read 1953 arcs [M::main] ===> Step 4.1: transitive reduction <=== [M::asg_arc_del_trans] transitively reduced 538 arcs [M::asg_arc_del_multi] removed 180 multi-arcs [M::asg_arc_del_asymm] removed 13 asymmetric arcs [M::main] ===> Step 4.2: initial tip cutting and bubble popping <=== [M::asg_cut_tip] cut 420 tips [M::asg_pop_bubble] popped 25 bubbles and trimmed 0 tips [M::main] ===> Step 4.3: cutting short overlaps (3 rounds in total) <=== [M::asg_arc_del_multi] removed 0 multi-arcs [M::asg_arc_del_asymm] removed 26 asymmetric arcs [M::asg_arc_del_short] removed 28 short overlaps [M::asg_cut_tip] cut 32 tips [M::asg_pop_bubble] popped 2 bubbles and trimmed 0 tips [M::asg_arc_del_short] removed 0 short overlaps [M::asg_arc_del_short] removed 0 short overlaps [M::main] ===> Step 4.4: removing short internal sequences and bi-loops <=== [M::asg_cut_internal] cut 2 internal sequences [M::asg_cut_biloop] cut 0 small bi-loops [M::asg_cut_tip] cut 2 tips [M::asg_pop_bubble] popped 0 bubbles and trimmed 0 tips [M::main] ===> Step 4.5: aggressively cutting short overlaps <=== [M::asg_arc_del_short] removed 0 short overlaps [M::main] ===> Step 5: generating unitigs <=== miniasm: asm.c:267: ma_ug_seq: Assertionsub[id].e - sub[id].s <= ks->seq.l' failed. Aborted (core dumped) `

Can you please tell me why this happen?

Thanks in advance. Athul

lh3 commented 8 years ago

See #2 and #11. You have reads with identical names.

athulmenon commented 8 years ago

Dear lh3, Thank you for your reply. I got around 18k duplicated read names running the code which you have given.

zcat input.fa.gz | perl -ne 'print "$1\n" if />(\S+)/' | sort | uniq -d

Can you please tell what can be a potential solution for this problem? Is it due to the conversion mistake from poretools?

Thanks in Advance! Athul

lh3 commented 8 years ago

I don't know about poretools. To use miniasm, you can simply rename your reads to "read1", "read2", ... For FASTA, it can be done this way (not tested):

zcat input.fa.gz | awk '/^>/{print ">read"NR}!/^>/'

For FASTQ, you need to write a proper script for renaming.

rec3141 commented 7 years ago

I'm getting this as well, seems to be that miniasm doesn't use the entire line to determine uniqueness?

e.g.

>d6febb45-00aa-4de3-937e-f496fbf2e108 runid=c98c9543b58908856e1ef3707a799d283e38158c read=1557 ch=130 start_time=2017-05-05T02:55:36Z id=120218_0
>d6febb45-00aa-4de3-937e-f496fbf2e108 runid=c98c9543b58908856e1ef3707a799d283e38158c read=1557 ch=130 start_time=2017-05-05T02:55:36Z id=120218_1