lh3 / miniasm

Ultrafast de novo assembly for long noisy reads (though having no consensus step)
MIT License
296 stars 68 forks source link

empty gfa with "0 sequences and 0 hits remain after containment removal" #56

Open JustinaMoon opened 5 years ago

JustinaMoon commented 5 years ago

Hi

I was doing genome assembly and it seems that miniasm gives empty gfa file in step 3.

[M::main] ===> Step 1: reading read mappings <=== [M::ma_hit_read::44.2000.99] read 44615892 hits; stored 34606417 hits and 1544137 sequences (13028768366 bp) [M::main] ===> Step 2: 1-pass (crude) read selection <=== [M::ma_hit_sub::47.1520.99] 1267343 query sequences remain after sub [M::ma_hit_cut::48.0010.99] 31551419 hits remain after cut [M::ma_hit_flt::48.7320.99] 31316687 hits remain after filtering; crude coverage after filtering: 17.63 [M::main] ===> Step 3: 2-pass (fine) read selection <=== [M::ma_hit_sub::49.7390.99] 1252369 query sequences remain after sub [M::ma_hit_cut::50.4950.99] 31111563 hits remain after cut *[M::ma_hit_contained::51.5610.99] 0 sequences and 0 hits remain after containment removal** [M::main] ===> Step 4: graph cleaning <=== [M::ma_sg_gen] read 0 arcs [M::main] ===> Step 4.1: transitive reduction <=== [M::asg_arc_del_trans] transitively reduced 0 arcs [M::main] ===> Step 4.2: initial tip cutting and bubble popping <=== [M::asg_cut_tip] cut 0 tips [M::asg_arc_del_multi] removed 0 multi-arcs [M::asg_arc_del_asymm] removed 0 asymmetric arcs [M::asg_pop_bubble] popped 0 bubbles and trimmed 0 tips [M::main] ===> Step 4.3: cutting short overlaps (3 rounds in total) <=== [M::asg_arc_del_short] removed 0 short overlaps [M::asg_arc_del_short] removed 0 short overlaps [M::asg_arc_del_short] removed 0 short overlaps [M::main] ===> Step 4.4: removing short internal sequences and bi-loops <=== [M::asg_cut_internal] cut 0 internal sequences [M::asg_cut_biloop] cut 0 small bi-loops [M::asg_cut_tip] cut 0 tips [M::asg_pop_bubble] popped 0 bubbles and trimmed 0 tips [M::main] ===> Step 4.5: aggressively cutting short overlaps <=== [M::asg_arc_del_short] removed 0 short overlaps [M::main] ===> Step 5: generating unitigs <=== [M::main] Version: 0.3-r179 [M::main] CMD: miniasm -f 20180727.Ct_1.fastq 20180727.Ct_1.paf.gz [M::main] Real time: 238.907 sec; CPU: 70.817 sec

Does anyone know how to fix it? Thanks in advance!

RxLoutre commented 5 years ago

Hello there,

I have the same issue, and even worse he doesnt even seems to read any hits before :

I have a 63G arabidobsis-overlap.paf.gz file and a 20G fastq file of reads. I did a minimap2 step first to produce the paf.gz file. And then I'm trying to use miniasm :

sbsuser@node146: /work/sbsuser/test/roxane/miniasm $ miniasm -f /work/sbsuser/test/roxane/miniasm/reads/arabidobsis-reads.fastq arabido-overlaps.paf.gz > arabido-assembly.gfa
[M::main] ===> Step 1: reading read mappings <===
[M::ma_hit_read::19.041*0.78] read 0 hits; stored 0 hits and 0 sequences (0 bp)
[M::main] ===> Step 2: 1-pass (crude) read selection <===
[M::ma_hit_sub::19.041*0.78] 0 query sequences remain after sub
[M::ma_hit_cut::19.041*0.78] 0 hits remain after cut
[M::ma_hit_flt::19.041*0.78] 0 hits remain after filtering; crude coverage after filtering: -nan
[M::main] ===> Step 3: 2-pass (fine) read selection <===
[M::ma_hit_sub::19.041*0.78] 0 query sequences remain after sub
[M::ma_hit_cut::19.041*0.78] 0 hits remain after cut
[M::ma_hit_contained::19.041*0.78] 0 sequences and 0 hits remain after containment removal
[M::main] ===> Step 4: graph cleaning <===
[M::ma_sg_gen] read 0 arcs
[M::main] ===> Step 4.1: transitive reduction <===
[M::asg_arc_del_trans] transitively reduced 0 arcs
[M::main] ===> Step 4.2: initial tip cutting and bubble popping <===
[M::asg_cut_tip] cut 0 tips
[M::asg_arc_del_multi] removed 0 multi-arcs
[M::asg_arc_del_asymm] removed 0 asymmetric arcs
[M::asg_pop_bubble] popped 0 bubbles and trimmed 0 tips
[M::main] ===> Step 4.3: cutting short overlaps (3 rounds in total) <===
[M::asg_arc_del_short] removed 0 short overlaps
[M::asg_arc_del_short] removed 0 short overlaps
[M::asg_arc_del_short] removed 0 short overlaps
[M::main] ===> Step 4.4: removing short internal sequences and bi-loops <===
[M::asg_cut_internal] cut 0 internal sequences
[M::asg_cut_biloop] cut 0 small bi-loops
[M::asg_cut_tip] cut 0 tips
[M::asg_pop_bubble] popped 0 bubbles and trimmed 0 tips
[M::main] ===> Step 4.5: aggressively cutting short overlaps <===
[M::asg_arc_del_short] removed 0 short overlaps
[M::main] ===> Step 5: generating unitigs <===
[M::main] Version: 0.2-r128
[M::main] CMD: miniasm -f /work/sbsuser/test/roxane/miniasm/reads/arabidobsis-reads.fastq arabido-overlaps.paf.gz
[M::main] Real time: 47.914 sec; CPU: 39.411 sec

At first I thought it was a ressource issue, but I have 160G of memory and 8 cores, I think it may be enough. Did you get any chance to solve your issue ?

Cheers,

Roxane

ryandward commented 5 years ago

Same issue here, I only started experiencing this issue when I switched from minimap to minimap2.

[M::main] ===> Step 1: reading read mappings <===
[M::ma_hit_read::101.238*0.99] read 183259322 hits; stored 138980044 hits and 3771372 sequences (38483514877 bp)
[M::main] ===> Step 2: 1-pass (crude) read selection <===
[M::ma_hit_sub::124.367*0.99] 2952780 query sequences remain after sub
[M::ma_hit_cut::131.681*0.99] 124156622 hits remain after cut
[M::ma_hit_flt::137.384*0.99] 105193859 hits remain after filtering; crude coverage after filtering: 25.15
[M::main] ===> Step 3: 2-pass (fine) read selection <===
[M::ma_hit_sub::142.258*0.99] 2903385 query sequences remain after sub
[M::ma_hit_cut::147.650*1.00] 103489462 hits remain after cut
[M::ma_hit_contained::153.435*1.00] 0 sequences and 0 hits remain after containment removal
[M::main] ===> Step 4: graph cleaning <===
[M::ma_sg_gen] read 0 arcs
[M::main] ===> Step 4.1: transitive reduction <===
[M::asg_arc_del_trans] transitively reduced 0 arcs
[M::main] ===> Step 4.2: initial tip cutting and bubble popping <===
[M::asg_cut_tip] cut 0 tips
[M::asg_arc_del_multi] removed 0 multi-arcs
[M::asg_arc_del_asymm] removed 0 asymmetric arcs
[M::asg_pop_bubble] popped 0 bubbles and trimmed 0 tips
[M::main] ===> Step 4.3: cutting short overlaps (3 rounds in total) <===
[M::asg_arc_del_short] removed 0 short overlaps
[M::asg_arc_del_short] removed 0 short overlaps
[M::asg_arc_del_short] removed 0 short overlaps
[M::main] ===> Step 4.4: removing short internal sequences and bi-loops <===
[M::asg_cut_internal] cut 0 internal sequences
[M::asg_cut_biloop] cut 0 small bi-loops
[M::asg_cut_tip] cut 0 tips
[M::asg_pop_bubble] popped 0 bubbles and trimmed 0 tips
[M::main] ===> Step 4.5: aggressively cutting short overlaps <===
[M::asg_arc_del_short] removed 0 short overlaps
[M::main] ===> Step 5: generating unitigs <===
[M::main] Version: 0.3-r179
[M::main] CMD: miniasm -f ../../pacbio_reads.fq pacbio_reads.paf
[M::main] Real time: 226.150 sec; CPU: 225.424 sec
ryandward commented 5 years ago

Update: It seems likely that the issue is caused by using minimap2's map-pb feature as opposed to ava-pb. When you are mapping reads onto themselves, there will be a 100% match at every read.

Bordeterre commented 1 year ago

Update: It seems likely that the issue is caused by using minimap2's map-pb feature as opposed to ava-pb. When you are mapping reads onto themselves, there will be a 100% match at every read.

I made the same mistake, and your solution worked for me as well, thanks. Unless it didn't work for @JustinaMoon or @RxLoutre, I suggest that one of the contributor closes this issue as solved