nanoporetech / medaka

Sequence correction provided by ONT Research
https://nanoporetech.com
Other
391 stars 73 forks source link

medaka_consensus run without pyabpoa #486

Closed greenprotist closed 3 months ago

greenprotist commented 6 months ago

Hi, I have been running medaka on a Flye assembled eukaryote genome assembly (~200Mbp) with base-called nanopore reads (~17Gbp). The following is the command line used:

medaka_consensus -d /path/to/the/flye/assembly/fasta/file -I /path/to/base/called/nano/pore/reads -o medaka_out -m r941_min_fast_g303 -t 2

The run made it to the finish line, creating consensus.fasta among other files. The original and corrected files however look the same (just sequence order re-arranged), so I wonder whether or not medaka made any corrections.

This is how the first lines of the log file looks like (flagging that pyabpoa cannot be imported):

TF_CPP_MIN_LOG_LEVEL is set to '3'
Cannot import pyabpoa, some features may not be available.
Cannot import pyabpoa, some features may not be available.
Cannot import pyabpoa, some features may not be available.
Checking program versions
This is medaka 1.11.3
Cannot import pyabpoa, some features may not be available.
Program    Version    Required   Pass     
bcftools   1.11       1.11       True     
bgzip      1.19       1.11       True     
minimap2   2.26       2.11       True     
samtools   1.11       1.11       True     
tabix      1.19       1.11       True     
Cannot import pyabpoa, some features may not be available.
Cannot import pyabpoa, some features may not be available.
Aligning basecalls to draft
Using the existing fai index file /nas3/ekim1/flye/telonema/run2/assembly.fasta.fai
Using the existing mmi index file /nas3/ekim1/flye/telonema/run2/assembly.fasta.map-ont.mmi
[M::mm_idx_gen::0.001*10.83] collected minimizers
[M::mm_idx_gen::0.001*5.33] sorted minimizers
[M::main::0.001*5.20] loaded/built the index for 0 target sequence(s)
[M::mm_mapopt_update::0.002*5.04] mid_occ = 10
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 0
[M::mm_idx_stat::0.002*4.89] distinct minimizers: 0 (-nan% are singletons); average occurrences: -nan; average spacing: -nan; total length: 0
[M::worker_pipeline::11.692*2.09] mapped 61011 sequences

The followings are the ending lines:

08:07:08 - Stitcher] Copying contig 'contig_1634' verbatim from input.
[08:07:08 - Stitcher] Copying contig 'contig_1151' verbatim from input.
[08:07:08 - Stitcher] Copying contig 'contig_793' verbatim from input.
[08:07:08 - Stitcher] Copying contig 'contig_1370' verbatim from input.
[08:07:08 - Stitcher] Copying contig 'contig_468' verbatim from input.
[08:07:08 - Stitcher] Copying contig 'contig_743' verbatim from input.
[08:07:08 - Stitcher] Copying contig 'contig_1036' verbatim from input.
Polished assembly written to medaka_telo_run2b/consensus.fasta, have a nice day.

My question is: was this run successful despite not being able to import pyabpoa? I've tried to install pyabpoa on the remote server computer but so far without success. If this medaka_consensus run doesn't require pyabpoa, assume I could proceed with the output consensus.fasta? I'd appreciate your comment. Many thanks. Happy New Year!

cjw85 commented 3 months ago

This is the expected behaviour. pyabpoa is required only for some niche functionality hence the warnings. The fact that your output ends with

Polished assembly written to medaka_telo_run2b/consensus.fasta, have a nice day.

suggests to me everything ran normally.