nanoporetech / pomoxis

Analysis components from Oxford Nanopore Research
Other
92 stars 23 forks source link

mini_assemble on short reads. #31

Closed cjw85 closed 4 years ago

cjw85 commented 5 years ago

Hello, In my case, I'm trying to run mekada for short read, (300 to 600 bp), I have the version 0.6.0, I'll tried to download from github repository and It's the same version, so I used 0.6.0 and the pipeline worked but I could n't obtain result(I have the fasta but it's empty).

mini_assemble -i ${BASECALLS} -o /home/ivan/Escritorio/medaka -p assm -t ${NPROC} Copying FASTX input to workspace: /home/ivan/medaka/C26_carb_f.fasta > /home/ivan/Escritorio/medaka/assm.fa.gz

Skipped adapter trimming. Skipped pre-assembly correction. Overlapping reads... [M::mm_idx_gen::0.0241.07] collected minimizers [M::mm_idx_gen::0.0301.94] sorted minimizers [M::main::0.0301.94] loaded/built the index for 1428 target sequence(s) [M::mm_mapopt_update::0.0311.92] mid_occ = 487 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 1428 [M::mm_idx_stat::0.0321.90] distinct minimizers: 20842 (0.00% are singletons); average occurrences: 6.932; average spacing: 3.071 [M::worker_pipeline::0.7575.43] mapped 1428 sequences [M::main] Version: 2.14-r883 [M::main] CMD: minimap2 -x ava-ont -K 500M -t 8 assm.fa.gz assm.fa.gz [M::main] Real time: 0.762 sec; CPU: 4.120 sec; Peak RSS: 0.035 GB Assembling graph... [M::main] ===> Step 1: reading read mappings <=== [M::ma_hit_read::0.2081.00] read 243144 hits; stored 486288 hits and 680 sequences (210170 bp) [M::main] ===> Step 2: 1-pass (crude) read selection <=== [M::ma_hit_sub::0.2601.00] 680 query sequences remain after sub [M::ma_hit_cut::0.2671.00] 486288 hits remain after cut [M::ma_hit_flt::0.2751.00] 484120 hits remain after filtering; crude coverage after filtering: 525.25 [M::main] ===> Step 3: 2-pass (fine) read selection <=== [M::ma_hit_sub::0.2951.00] 680 query sequences remain after sub [M::ma_hit_cut::0.3031.00] 484120 hits remain after cut [M::ma_hit_contained::0.3111.00] 10 sequences and 32 hits remain after containment removal [M::main] ===> Step 4: graph cleaning <=== [M::ma_sg_gen] read 16 arcs [M::main] ===> Step 4.1: transitive reduction <=== [M::asg_arc_del_trans] transitively reduced 0 arcs [M::main] ===> Step 4.2: initial tip cutting and bubble popping <=== [M::asg_cut_tip] cut 10 tips [M::asg_arc_del_multi] removed 0 multi-arcs [M::asg_arc_del_asymm] removed 0 asymmetric arcs [M::asg_pop_bubble] popped 0 bubbles and trimmed 0 tips [M::main] ===> Step 4.3: cutting short overlaps (3 rounds in total) <=== [M::asg_arc_del_short] removed 0 short overlaps [M::asg_arc_del_short] removed 0 short overlaps [M::asg_arc_del_short] removed 0 short overlaps [M::main] ===> Step 4.4: removing short internal sequences and bi-loops <=== [M::asg_cut_internal] cut 0 internal sequences [M::asg_cut_biloop] cut 0 small bi-loops [M::asg_cut_tip] cut 0 tips [M::asg_pop_bubble] popped 0 bubbles and trimmed 0 tips [M::main] ===> Step 4.5: aggressively cutting short overlaps <=== [M::asg_arc_del_short] removed 0 short overlaps [M::main] ===> Step 5: generating unitigs <=== [M::main] Version: 0.3-r179 [M::main] CMD: miniasm -s 100 -e 3 -f assm.fa.gz assm.paf.gz [M::main] Real time: 0.315 sec; CPU: 0.316 sec Running racon read shuffle 1... Running round 1 consensus... [M::mm_idx_gen::0.0002.72] collected minimizers [M::mm_idx_gen::0.0013.95] sorted minimizers [M::main::0.0013.91] loaded/built the index for 0 target sequence(s) [M::mm_mapopt_update::0.0013.81] mid_occ = 1 [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 0 [M::mm_idx_stat::0.0013.73] distinct minimizers: 0 (-nan% are singletons); average occurrences: -nan; average spacing: -nan [M::worker_pipeline::0.0073.27] mapped 1428 sequences [M::main] Version: 2.14-r883 [M::main] CMD: minimap2 -K 500M -t 8 assm.gfa.fa.gz assm.fa.gz [M::main] Real time: 0.008 sec; CPU: 0.024 sec; Peak RSS: 0.003 GB [racon::Polisher::initialize] error: empty target sequences set! Running round 2 consensus... [M::mm_idx_gen::0.0004.02] collected minimizers [M::mm_idx_gen::0.0014.91] sorted minimizers [M::main::0.0014.87] loaded/built the index for 0 target sequence(s) [M::mm_mapopt_update::0.0014.74] mid_occ = 1 [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 0 [M::mm_idx_stat::0.0014.61] distinct minimizers: 0 (-nan% are singletons); average occurrences: -nan; average spacing: -nan [M::worker_pipeline::0.0073.37] mapped 1428 sequences [M::main] Version: 2.14-r883 [M::main] CMD: minimap2 -K 500M -t 8 racon_1_1.fa.gz assm.fa.gz [M::main] Real time: 0.008 sec; CPU: 0.025 sec; Peak RSS: 0.003 GB [racon::Polisher::initialize] error: empty target sequences set! Running round 3 consensus... [M::mm_idx_gen::0.0003.61] collected minimizers [M::mm_idx_gen::0.0014.48] sorted minimizers [M::main::0.0014.44] loaded/built the index for 0 target sequence(s) [M::mm_mapopt_update::0.0014.34] mid_occ = 1 [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 0 [M::mm_idx_stat::0.0014.25] distinct minimizers: 0 (-nan% are singletons); average occurrences: -nan; average spacing: -nan [M::worker_pipeline::0.0083.07] mapped 1428 sequences [M::main] Version: 2.14-r883 [M::main] CMD: minimap2 -K 500M -t 8 racon_1_2.fa.gz assm.fa.gz [M::main] Real time: 0.008 sec; CPU: 0.024 sec; Peak RSS: 0.003 GB [racon::Polisher::initialize] error: empty target sequences set! Running round 4 consensus... [M::mm_idx_gen::0.0003.84] collected minimizers [M::mm_idx_gen::0.0014.45] sorted minimizers [M::main::0.0014.42] loaded/built the index for 0 target sequence(s) [M::mm_mapopt_update::0.0014.29] mid_occ = 1 [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 0 [M::mm_idx_stat::0.0014.18] distinct minimizers: 0 (-nan% are singletons); average occurrences: -nan; average spacing: -nan [M::worker_pipeline::0.0083.18] mapped 1428 sequences [M::main] Version: 2.14-r883 [M::main] CMD: minimap2 -K 500M -t 8 racon_1_3.fa.gz assm.fa.gz [M::main] Real time: 0.008 sec; CPU: 0.026 sec; Peak RSS: 0.003 GB [racon::Polisher::initialize] error: empty target sequences set! Waiting for cleanup. rm: can not delete 'shuffled ': the file or directory does not exist rm: can not delete ' paf ': the file or directory does not exist Final assembly written in /home/ivan/medaka_2/assm_final.fa. You have a good day.

I checked the pipeline with the tutorial data and work perfectly. Could be in this case that I have to adjust something from minimap2, every time apperearing in the message minimap2 -K 500M -t 8 racon_1_3.fa.gz assm.fa.gz. I checked the final length of consensus.fasta from data tutorial is 47018010(4700M). Could be de reason for don't get consensus with my data?Do you know how to modificate minimap requeriments or another requeriments for get data with this small size?

Thank you very much

Originally posted by @Ivanbh214 in https://github.com/nanoporetech/medaka/issues/31#issuecomment-472214645

cjw85 commented 5 years ago

In the above the running of miniasm is failing to build an assembly. minimap2 naturally then fails with

[M::main::0.0013.91] loaded/built the index for 0 target sequence(s)

With your dataset I would advise running these tools separately outside of the mini_assemble script since it is not designed with such short reads in mind. You might also like to try wtdbg2 to assemble your reads, after which you can use mini_assemble with the -r option to run the subsequent steps, finally running medaka on the output.

cjw85 commented 4 years ago

Closing due to inactivity.