nanoporetech / medaka

Sequence correction provided by ONT Research
https://nanoporetech.com
Other
391 stars 73 forks source link

RuntimeWarning: divide by zero encountered in log10 #368

Closed wanyuac closed 10 months ago

wanyuac commented 2 years ago

Describe the bug There are always runtime warnings of "divide by zero encountered in log10" in each run.

Logging

(medaka1.6) ywan@ICL:~/assembly/medaka$ medaka_consensus -i nanopore.fastq.gz -d /home/ywan/assembly/raw/raven_assembly.fasta -o tmp -m r941_min_hac_g507 -t 4 -f
Checking program versions
This is medaka 1.6.0
Program    Version    Required   Pass     
bcftools   1.15.1     1.11       True     
bgzip      1.15.1     1.11       True     
minimap2   2.24       2.11       True     
samtools   1.15.1     1.11       True     
tabix      1.15.1     1.11       True     
Aligning basecalls to draft
Creating fai index file /home/ywan/assembly/raw/raven_assembly.fasta.fai
Creating mmi index file /home/ywan/assembly/raw/raven_assembly.fasta.map-ont.mmi
[M::mm_idx_gen::0.071*1.02] collected minimizers
[M::mm_idx_gen::0.084*1.32] sorted minimizers
[M::main::0.105*1.25] loaded/built the index for 2 target sequence(s)
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 2
[M::mm_idx_stat::0.111*1.24] distinct minimizers: 481296 (97.95% are singletons); average occurrences: 1.035; average spacing: 5.336; total length: 2658491
[M::main] Version: 2.24-r1122
[M::main] CMD: minimap2 -I 16G -x map-ont -d /home/ywan/assembly/raw/raven_assembly.fasta.map-ont.mmi /home/ywan/assembly/raw/raven_assembly.fasta
[M::main] Real time: 0.117 sec; CPU: 0.144 sec; Peak RSS: 0.029 GB
[M::main::0.029*1.02] loaded/built the index for 2 target sequence(s)
[M::mm_mapopt_update::0.036*1.02] mid_occ = 10
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 2
[M::mm_idx_stat::0.041*1.01] distinct minimizers: 481296 (97.95% are singletons); average occurrences: 1.035; average spacing: 5.336; total length: 2658491
[M::worker_pipeline::12.458*3.06] mapped 38732 sequences
[M::main] Version: 2.24-r1122
[M::main] CMD: minimap2 -x map-ont --secondary=no -L --MD -A 2 -B 4 -O 4,24 -E 2,1 -t 4 -a /home/ywan/assembly/raw/raven_assembly.fasta.map-ont.mmi /home/ywan/assembly/raw/nanopore.fastq.gz
[M::main] Real time: 12.461 sec; CPU: 38.112 sec; Peak RSS: 0.467 GB
[bam_sort_core] merging from 0 files and 4 in-memory blocks...
Running medaka consensus
[20:04:43 - Predict] Reducing threads to 2, anymore is a waste.
[20:04:43 - Predict] It looks like you are running medaka without a GPU and attempted to set a high number of threads. We have scaled this down to an optimal number. If you wish to improve performance please see https://nanoporetech.github.io/medaka/installation.html#improving-parallelism.
[20:04:43 - Predict] Setting tensorflow inter/intra-op threads to 2/1.
[20:04:43 - Predict] Processing region(s): Utg1030:0-2652846 Utg1032:0-5645
[20:04:43 - Predict] Using model: /home/ywan/anaconda3/envs/medaka1.6/lib/python3.8/site-packages/medaka/data/r941_min_hac_g507_model.tar.gz.
[20:04:43 - Predict] Processing 4 long region(s) with batching.
[20:04:44 - MdlStrTF] Model <tensorflow.python.keras.engine.sequential.Sequential object at 0x7fa3d28ec4c0>
[20:04:44 - MdlStrTF] loading weights from /tmp/tmp2580hym8/model/variables/variables
[20:04:44 - BAMFile] Creating pool of 16 BAM file sets.
[20:04:44 - Sampler] Initializing sampler for consensus of region Utg1030:0-1000000.
[20:04:44 - Sampler] Initializing sampler for consensus of region Utg1030:999000-1999000.
[20:04:44 - PWorker] Running inference for 2.7M draft bases.
[20:04:45 - Feature] Processed Utg1030:999000.0-1998999.0 (median depth 37.0)
[20:04:45 - Sampler] Took 1.59s to make features.
[20:04:45 - Sampler] Initializing sampler for consensus of region Utg1030:1998000-2652846.
[20:04:45 - Feature] Processed Utg1030:0.0-999999.0 (median depth 44.0)
[20:04:45 - Sampler] Took 1.72s to make features.
[20:04:45 - Sampler] Initializing sampler for consensus of region Utg1032:0-5645.
[20:04:45 - Feature] Processed Utg1032:0.0-5644.0 (median depth 238.0)
[20:04:45 - Sampler] Took 0.19s to make features.
[20:04:46 - Feature] Processed Utg1030:1998000.0-2652845.0 (median depth 42.0)
[20:04:46 - Sampler] Took 0.84s to make features.
[20:05:02 - PWorker] Batches in cache: 5.
[20:05:02 - PWorker] 23.5% Done (0.6/2.7 Mbases) in 18.4s
[20:05:17 - PWorker] Batches in cache: 4.
[20:05:17 - PWorker] 46.1% Done (1.2/2.7 Mbases) in 33.1s
[20:05:31 - PWorker] Batches in cache: 3.
[20:05:31 - PWorker] 67.4% Done (1.8/2.7 Mbases) in 47.7s
[20:05:45 - PWorker] Batches in cache: 2.
[20:05:45 - PWorker] 89.8% Done (2.4/2.7 Mbases) in 61.8s
[20:05:54 - PWorker] Batches in cache: 1.
[20:05:54 - PWorker] Processed 5 batches
[20:05:54 - PWorker] All done, 0 remainder regions.
[20:05:54 - Predict] Finished processing all regions.
[20:05:56 - DataIndx] Loaded 1/1 (100.00%) sample files.
[20:05:56 - DataIndx] Loaded 1/1 (100.00%) sample files.
[20:05:56 - DataIndx] Loaded 1/1 (100.00%) sample files.
[20:05:56 - DataIndx] Loaded 1/1 (100.00%) sample files.
[20:05:56 - DataIndx] Loaded 1/1 (100.00%) sample files.
/home/ywan/anaconda3/envs/medaka1.6/lib/python3.8/site-packages/medaka/labels.py:387: RuntimeWarning: divide by zero encountered in log10
  q = -10 * np.log10(err)
/home/ywan/anaconda3/envs/medaka1.6/lib/python3.8/site-packages/medaka/labels.py:387: RuntimeWarning: divide by zero encountered in log10
  q = -10 * np.log10(err)
/home/ywan/anaconda3/envs/medaka1.6/lib/python3.8/site-packages/medaka/labels.py:387: RuntimeWarning: divide by zero encountered in log10
  q = -10 * np.log10(err)
/home/ywan/anaconda3/envs/medaka1.6/lib/python3.8/site-packages/medaka/labels.py:387: RuntimeWarning: divide by zero encountered in log10
  q = -10 * np.log10(err)
Polished assembly written to tmp/consensus.fasta, have a nice day.

Environment (if you do not have a GPU, write No GPU):

afvrbanac commented 2 years ago

I see the same warning, on Ubuntu 20.04 LTS and just installed medaka via conda

Edit: Not sure if this could be relevant, but I used Guppy 6.1.2 for basecalling and am using the medaka model r104_e81_hac_g5015.

naturepoker commented 2 years ago

I have the same warning as well -

Pop!_OS 22.04, and just updated medaka with conda (in its separate environment). I'm pretty sure I didn't get this message with 1.5.0

The data's been basecalled with guppy 6.0.1 with dna_r9.4.1_450bps_sup.cfg model.

cjw85 commented 2 years ago

The warning cam be safely ignored. I'm not sure why the code emits it because if I recall correctly there is a guard against it happening. We'll take a look.

cjw85 commented 10 months ago

I believe this has been resolved.