lh3 / miniasm

Ultrafast de novo assembly for long noisy reads (though having no consensus step)
MIT License
299 stars 68 forks source link

Core Dumped while Generating Unitigs #27

Open judzen opened 7 years ago

judzen commented 7 years ago

Hello: I have checked my reads for duplicate entries as several previous entries have recommended and nothing has printed to STDOUT-- so I do not have duplicate entries in either my Self-to-Self minimap mapping (*.paf.gz) or my input fastq files.

I am getting the following error during de novo assembly with miniasm during Step 5: generating unitigs:

M::main] ===> Step 1: reading read mappings <=== [M::ma_hit_read::318.0801.00] read 112487066 hits; stored 163806093 hits and 252748 sequences (3081524130 bp) [M::main] ===> Step 2: 1-pass (crude) read selection <=== [M::ma_hit_sub::364.3611.00] 226439 query sequences remain after sub [M::ma_hit_cut::368.5081.00] 160297166 hits remain after cut [M::ma_hit_flt::374.0541.00] 145797682 hits remain after filtering; crude coverage after filtering: 460.63 [M::main] ===> Step 3: 2-pass (fine) read selection <=== [M::ma_hit_sub::389.2671.00] 225559 query sequences remain after sub [M::ma_hit_cut::392.9511.00] 145087145 hits remain after cut [M::ma_hit_contained::399.349*1.00] 3193 sequences and 58708 hits remain after containment removal [M::main] ===> Step 4: graph cleaning <=== [M::ma_sg_gen] read 39392 arcs [M::main] ===> Step 4.1: transitive reduction <=== [M::asg_arc_del_trans] transitively reduced 20135 arcs [M::asg_arc_del_multi] removed 1 multi-arcs [M::asg_arc_del_asymm] removed 1586 asymmetric arcs [M::main] ===> Step 4.2: initial tip cutting and bubble popping <=== [M::asg_cut_tip] cut 1119 tips [M::asg_pop_bubble] popped 1 bubbles and trimmed 0 tips [M::main] ===> Step 4.3: cutting short overlaps (3 rounds in total) <=== [M::asg_arc_del_multi] removed 0 multi-arcs [M::asg_arc_del_asymm] removed 1381 asymmetric arcs [M::asg_arc_del_short] removed 2873 short overlaps [M::asg_cut_tip] cut 941 tips [M::asg_pop_bubble] popped 10 bubbles and trimmed 7 tips [M::asg_arc_del_multi] removed 0 multi-arcs [M::asg_arc_del_asymm] removed 155 asymmetric arcs [M::asg_arc_del_short] removed 197 short overlaps [M::asg_cut_tip] cut 387 tips [M::asg_pop_bubble] popped 16 bubbles and trimmed 8 tips [M::asg_arc_del_multi] removed 0 multi-arcs [M::asg_arc_del_asymm] removed 63 asymmetric arcs [M::asg_arc_del_short] removed 65 short overlaps [M::asg_cut_tip] cut 149 tips [M::asg_pop_bubble] popped 4 bubbles and trimmed 3 tips [M::main] ===> Step 4.4: removing short internal sequences and bi-loops <=== [M::asg_cut_internal] cut 23 internal sequences [M::asg_cut_biloop] cut 17 small bi-loops [M::asg_cut_tip] cut 31 tips [M::asg_pop_bubble] popped 1 bubbles and trimmed 1 tips [M::main] ===> Step 4.5: aggressively cutting short overlaps <=== [M::asg_arc_del_multi] removed 0 multi-arcs [M::asg_arc_del_asymm] removed 5 asymmetric arcs [M::asg_arc_del_short] removed 5 short overlaps [M::asg_cut_tip] cut 8 tips [M::asg_pop_bubble] popped 0 bubbles and trimmed 0 tips [M::main] ===> Step 5: generating unitigs <=== miniasm: asm.c:267: ma_ug_seq: Assertion `sub[id].e - sub[id].s <= ks->seq.l' failed. Aborted (core dumped)

Any assistance with respect to this error would be greatly appreciated.

Thanks!

judzen commented 7 years ago

minimap version: [M::main] Version: 0.2-r124-dirty miniasm version: 0.2-r168-dirty

judzen commented 7 years ago

line 24 of hit.c contains the following, which I presume has been corrected:

void ma_hit_mark_unused(sdict_t d, size_t n, const ma_hit_t a)

lh3 commented 7 years ago

Your input FASTQ has duplicated entries.

judzen commented 7 years ago

Thank you for your prompt response‹ it is appreciated.

After reviewing the errors that others have described during the unitig step, I performed the following to determine whether or not I had duplicate entries among the input FASTQ data:

cat my_file.fastq | perl -ne 'print "$1\n" if />(\S+)/' | sort | uniq -d zcat 0000_reads.paf.gz | perl -ne 'print "$1\n" if />(\S+)/' | sort | uniq -d

As a result of running these 2 commands, nothing printed to STDOUT.

If you could please advise, we would greatly appreciate it!

Thanks, J

From: Heng Li notifications@github.com Reply-To: lh3/miniasm <reply+00c76eabab97cf94257e2055e2de812608ae846bb06dc7f392cf00000001157fa4ce 92a169ce0e79454c@reply.github.com> Date: Thursday, July 13, 2017 at 5:14 PM To: lh3/miniasm miniasm@noreply.github.com Cc: Judson Hervey hervey.judson@gmail.com, Author author@noreply.github.com Subject: Re: [lh3/miniasm] Core Dumped while Generating Unitigs (#27)

Your input FASTQ has duplicated entries. ‹ You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lh3/miniasm/issues/27#issuecomment-315204322, or mute the thread https://github.com/notifications/unsubscribe-auth/AMduq3QehYaVHVgI-XPJd0DF PUiKv2cFks5sNojOgaJpZM4OXiRY.

{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4 bb","name":"GitHub"},"entity":{"external_key":"github/lh3/miniasm","title": "lh3/miniasm","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/14 3418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url": "https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-1 1e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/lh3/miniasm"}},"updates":{"snippets":[{"i con":"PERSON","message":"@lh3 in #27: Your input FASTQ has duplicated entries."}],"action":{"name":"View Issue","url":"https://github.com/lh3/miniasm/issues/27#issuecomment-3152043 22"}}}

lh3 commented 7 years ago

Your input is fastq. The command line you are quoting only works with fasta. It doesn't work with fastq.

I am very certain that you have duplicated entries. There have been no exceptions so far.

judzen commented 7 years ago

Thank you for your reply--

After running the dedupe.sh script (from bbmap), we detected 11 duplicates, 3 invalid, and 3 contained sequences, which were removed.

We then ran minimap followed by miniasm (for assembly) and received the following error:

[M::main] ===> Step 5: generating unitigs <=== miniasm: asm.c:267: ma_ug_seq: Assertion `sub[id].e - sub[id].s <= ks->seq.l' failed. Aborted

Thoughts/suggestions??

Thank you kindly for your prompt responses & feedback-- they are appreciated.

Sincerely, J

mptrsen commented 5 years ago

Your input FASTQ has duplicated entries.

This shouldn't be a problem, nor should it make miniasm crash. If anything, mappings from duplicate reads should be ignored.

Thanks for the hint, though. I had the same problem and I know I have duplicate reads.