nanoporetech / pomoxis

Analysis components from Oxford Nanopore Research
Other
94 stars 23 forks source link

mini_assemble empty target sequences error #33

Open kautto opened 5 years ago

kautto commented 5 years ago

Dear nanopore devs,

I'm having issues getting mini_assemble to run on human (HG001/NA12978) data. I've successfully ran it on smaller assemblies before, but a ~30x human genome seems to be causing issues. I'm running on a 96 core/768 gig RAM AWS instance. After running the minimapping for a while, it eventually gets to:

[M::worker_pipeline::751.688*6.16] mapped 18095 sequences
[M::worker_pipeline::754.787*6.16] mapped 19902 sequences
[M::main] Version: 2.14-r883
[M::main] CMD: minimap2 -K 500M -t 96 NA12878.gfa.fa.gz NA12878.fa.gz
[M::main] Real time: 754.791 sec; CPU: 4648.586 sec; Peak RSS: 1.928 GB
[racon::Polisher::initialize] error: empty target sequences set!
[M::mm_idx_gen::0.014*0.17] collected minimizers
[M::mm_idx_gen::0.019*9.13] sorted minimizers
[M::main::0.019*9.12] loaded/built the index for 0 target sequence(s)
[M::mm_mapopt_update::0.019*9.10] mid_occ = 718917417
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 0
[M::mm_idx_stat::0.019*9.08] distinct minimizers: 0 (-nan% are singletons); average occurrences: -nan; average spacing: -nan
[M::worker_pipeline::4.056*6.97] mapped 82521 sequences
[M::worker_pipeline::7.981*7.89] mapped 91941 sequences
[M::worker_pipeline::11.426*7.27] mapped 83333 sequences

Which then results in the same "empty target sequences set" error propagating until the whole thing fails:

[M::main] Version: 2.14-r883
[M::main] CMD: minimap2 -K 500M -t 96 racon_1_1.fa.gz NA12878.fa.gz
[M::main] Real time: 768.219 sec; CPU: 4632.751 sec; Peak RSS: 1.931 GB
[racon::Polisher::initialize] error: empty target sequences set!
[M::mm_idx_gen::0.010*0.35] collected minimizers
[M::mm_idx_gen::0.015*12.78] sorted minimizers
[M::main::0.015*12.77] loaded/built the index for 0 target sequence(s)
[M::mm_mapopt_update::0.015*12.73] mid_occ = 0
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 0
[M::mm_idx_stat::0.015*12.70] distinct minimizers: 0 (-nan% are singletons); average occurrences: -nan; average spacing: -nan
[M::main] Version: 2.14-r883
[M::main] CMD: minimap2 -K 500M -t 96 racon_1_3.fa.gz NA12878.fa.gz
[M::main] Real time: 740.812 sec; CPU: 4347.913 sec; Peak RSS: 1.894 GB
[racon::Polisher::initialize] error: empty target sequences set!
rm: cannot remove 'shuffled*': No such file or directory
rm: cannot remove '*paf*': No such file or directory

Any ideas where to start troubleshooting this?

Edit: The input definitely isn't empty when it starts the run.

cjw85 commented 5 years ago

Hi @kautto,

We would not recommend running mini_assemble on a human genome, this is not really a use case considered in its design. You would be better served by using a purpose-built, robust assembler like canu, flye, or shasta.

glf20 commented 4 years ago

Hi, I have been getting a similar error running mini_assemble on a small 3.5kb amplicon. It has been running ok, but when i filter the data to include shorter length fragments it gives me a similar error.

[M::asg_arc_del_multi] removed 0 multi-arcs [M::asg_arc_del_asymm] removed 0 asymmetric arcs [M::asg_pop_bubble] popped 0 bubbles and trimmed 0 tips [M::main] ===> Step 4.3: cutting short overlaps (3 rounds in total) <=== [M::asg_arc_del_short] removed 0 short overlaps [M::asg_arc_del_short] removed 0 short overlaps [M::asg_arc_del_short] removed 0 short overlaps [M::main] ===> Step 4.4: removing short internal sequences and bi-loops <=== [M::asg_cut_internal] cut 0 internal sequences [M::asg_cut_biloop] cut 0 small bi-loops [M::asg_cut_tip] cut 0 tips [M::asg_pop_bubble] popped 0 bubbles and trimmed 0 tips [M::main] ===> Step 4.5: aggressively cutting short overlaps <=== [M::asg_arc_del_short] removed 0 short overlaps [M::main] ===> Step 5: generating unitigs <=== [M::main] Version: 0.3-r179 [M::main] CMD: miniasm -s 100 -e 3 -f AS2k_denovo_trimmed.fa.gz AS2k_denovo.paf.gz [M::main] Real time: 3170.456 sec; CPU: 3167.859 sec Running racon read shuffle 1... Running round 1 consensus... [M::mm_idx_gen::0.0004.33] collected minimizers [M::mm_idx_gen::0.00212.06] sorted minimizers [M::main::0.00211.89] loaded/built the index for 0 target sequence(s) [M::mm_mapopt_update::0.00211.66] mid_occ = 1 [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 0 [M::mm_idx_stat::0.00211.43] distinct minimizers: 0 (-nan% are singletons); average occurrences: -nan; average spacing: -nan [M::worker_pipeline::1.9294.83] mapped 67078 sequences [M::main] Version: 2.17-r941 [M::main] CMD: minimap2 -K 500M -t 48 AS2k_denovo.gfa.fa.gz AS2k_denovo_trimmed.fa.gz [M::main] Real time: 1.930 sec; CPU: 9.322 sec; Peak RSS: 0.186 GB [racon::Polisher::initialize] error: empty target sequences set!

the command i have been running on our HPC is: mini_assemble -i /data/freimanis/analysis_files/nanopore/Run1/Asia/2kb/filtered/filtered.fq -o denovo -p AS2k_denovo -t 48 -c

Can anyone help. I have rerun several times and keep getting same result.

glf20 commented 4 years ago

the full log is as below: Skipped pre-assembly correction. Overlapping reads... [M::mm_idx_gen::6.1121.68] collected minimizers [M::mm_idx_gen::6.7512.87] sorted minimizers [M::main::6.7512.87] loaded/built the index for 67078 target sequence(s) [M::mm_mapopt_update::6.8752.84] mid_occ = 18774 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 67078 [M::mm_idx_stat::6.9502.82] distinct minimizers: 4409671 (68.07% are singletons); average occurrences: 14.201; average spacing: 2.979 [M::worker_pipeline::4217.1775.55] mapped 67078 sequences [M::main] Version: 2.17-r941 [M::main] CMD: minimap2 -x ava-ont -K 500M -t 48 AS2k_denovo_trimmed.fa.gz AS2k_denovo_trimmed.fa.gz [M::main] Real time: 4217.266 sec; CPU: 23396.592 sec; Peak RSS: 100.374 GB Assembling graph... [M::main] ===> Step 1: reading read mappings <=== [M::ma_hit_read::2278.6661.00] read 1251922617 hits; stored 2503845219 hits and 67077 sequences (186573504 bp) [M::main] ===> Step 2: 1-pass (crude) read selection <=== [M::ma_hit_sub::2712.4841.00] 67077 query sequences remain after sub [M::ma_hit_cut::2766.5991.00] 2503589437 hits remain after cut [M::ma_hit_flt::2851.7541.00] 2468450488 hits remain after filtering; crude coverage after filtering: 22268.75 [M::main] ===> Step 3: 2-pass (fine) read selection <=== [M::ma_hit_sub::3023.4051.00] 67075 query sequences remain after sub [M::ma_hit_cut::3076.8531.00] 2468279139 hits remain after cut [M::ma_hit_contained::3165.6161.00] 21 sequences and 26 hits remain after containment removal [M::main] ===> Step 4: graph cleaning <=== [M::ma_sg_gen] read 10 arcs [M::main] ===> Step 4.1: transitive reduction <=== [M::asg_arc_del_trans] transitively reduced 0 arcs [M::main] ===> Step 4.2: initial tip cutting and bubble popping <=== [M::asg_cut_tip] cut 17 tips [M::asg_arc_del_multi] removed 0 multi-arcs [M::asg_arc_del_asymm] removed 0 asymmetric arcs [M::asg_pop_bubble] popped 0 bubbles and trimmed 0 tips [M::main] ===> Step 4.3: cutting short overlaps (3 rounds in total) <=== [M::asg_arc_del_short] removed 0 short overlaps [M::asg_arc_del_short] removed 0 short overlaps [M::asg_arc_del_short] removed 0 short overlaps [M::main] ===> Step 4.4: removing short internal sequences and bi-loops <=== [M::asg_cut_internal] cut 0 internal sequences [M::asg_cut_biloop] cut 0 small bi-loops [M::asg_cut_tip] cut 0 tips [M::asg_pop_bubble] popped 0 bubbles and trimmed 0 tips [M::main] ===> Step 4.5: aggressively cutting short overlaps <=== [M::asg_arc_del_short] removed 0 short overlaps [M::main] ===> Step 5: generating unitigs <=== [M::main] Version: 0.3-r179 [M::main] CMD: miniasm -s 100 -e 3 -f AS2k_denovo_trimmed.fa.gz AS2k_denovo.paf.gz [M::main] Real time: 3170.456 sec; CPU: 3167.859 sec Running racon read shuffle 1... Running round 1 consensus... [M::mm_idx_gen::0.0004.33] collected minimizers [M::mm_idx_gen::0.002*12.06] sorted minimizers

[M::main::0.00211.89] loaded/built the index for 0 target sequence(s) [M::mm_mapopt_update::0.00211.66] mid_occ = 1 [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 0 [M::mm_idx_stat::0.00211.43] distinct minimizers: 0 (-nan% are singletons); average occurrences: -nan; average spacing: -nan [M::worker_pipeline::1.9294.83] mapped 67078 sequences [M::main] Version: 2.17-r941 [M::main] CMD: minimap2 -K 500M -t 48 AS2k_denovo.gfa.fa.gz AS2k_denovo_trimmed.fa.gz [M::main] Real time: 1.930 sec; CPU: 9.322 sec; Peak RSS: 0.186 GB [racon::Polisher::initialize] error: empty target sequences set!