Weird behavior on docker container / medaka_consensus

znruss commented 1 year ago

Describe the bug Ran the following command:

BASECALLS=/mnt/scratch2/NGS6811_unmodified.fastq
DRAFT=/mnt/scratch2/NGS6811_assembly/NGS6811.contigs.filtered.fasta
OUTDIR=/mnt/scratch2/NGS6811_consensus1
docker run --volume /mnt/scratch2:/mnt/scratch2 --rm --gpus all ontresearch/medaka:latest medaka_consensus  -b 150 -i ${BASECALLS} -d ${DRAFT} -o ${OUTDIR} -t 20 -m r1041_e82_260bps_sup_g632

Got the following concerning lines back:

[20:44:42 - Predict] Reducing threads to 2, anymore is a waste.
[20:44:42 - Predict] It looks like you are running medaka without a GPU and attempted to set a high number of threads. We have scaled this down to an optimal number. If you wish to improve performance please see

/home/epi2melabs/conda/lib/python3.8/site-packages/medaka/labels.py:387: RuntimeWarning: divide by zero encountered in log10
  q = -10 * np.log10(err)

Logging

Checking program versions
This is medaka 1.7.2
Program    Version    Required   Pass     
bcftools   1.16       1.11       True     
bgzip      1.16       1.11       True     
minimap2   2.24       2.11       True     
samtools   1.16.1     1.11       True     
tabix      1.16       1.11       True     
Aligning basecalls to draft
Creating fai index file /mnt/scratch2/NGS6811_assembly/NGS6811.contigs.filtered.fasta.fai
Creating mmi index file /mnt/scratch2/NGS6811_assembly/NGS6811.contigs.filtered.fasta.map-ont.mmi
[M::mm_idx_gen::0.086*1.01] collected minimizers
[M::mm_idx_gen::0.106*1.39] sorted minimizers
[M::main::0.154*1.27] loaded/built the index for 4 target sequence(s)
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 4
[M::mm_idx_stat::0.163*1.26] distinct minimizers: 967359 (95.38% are singletons); average occurrences: 1.078; average spacing: 5.350; total length: 5576691
[M::main] Version: 2.24-r1122
[M::main] CMD: minimap2 -I 16G -x map-ont -d /mnt/scratch2/NGS6811_assembly/NGS6811.contigs.filtered.fasta.map-ont.mmi /mnt/scratch2/NGS6811_assembly/NGS6811.contigs.filtered.fasta
[M::main] Real time: 0.167 sec; CPU: 0.208 sec; Peak RSS: 0.057 GB
[M::main::0.034*1.02] loaded/built the index for 4 target sequence(s)
[M::mm_mapopt_update::0.045*1.01] mid_occ = 11
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 4
[M::mm_idx_stat::0.053*1.01] distinct minimizers: 967359 (95.38% are singletons); average occurrences: 1.078; average spacing: 5.350; total length: 5576691
[M::worker_pipeline::13.124*18.38] mapped 86471 sequences
[M::worker_pipeline::22.334*19.13] mapped 81614 sequences
[M::worker_pipeline::32.122*19.37] mapped 83707 sequences
[M::worker_pipeline::41.873*19.52] mapped 82430 sequences
[M::worker_pipeline::51.260*19.62] mapped 80575 sequences
[M::worker_pipeline::60.338*19.69] mapped 81490 sequences
[M::worker_pipeline::68.414*19.21] mapped 93407 sequences
[M::worker_pipeline::68.654*19.15] mapped 5413 sequences
[M::main] Version: 2.24-r1122
[M::main] CMD: minimap2 -x map-ont --secondary=no -L --MD -A 2 -B 4 -O 4,24 -E 2,1 -t 20 -a /mnt/scratch2/NGS6811_assembly/NGS6811.contigs.filtered.fasta.map-ont.mmi /mnt/scratch2/NGS6811_unmodified.fastq
[M::main] Real time: 68.727 sec; CPU: 1314.490 sec; Peak RSS: 11.474 GB
[bam_sort_core] merging from 0 files and 20 in-memory blocks...
Running medaka consensus
[20:44:42 - Predict] Reducing threads to 2, anymore is a waste.
[20:44:42 - Predict] It looks like you are running medaka without a GPU and attempted to set a high number of threads. We have scaled this down to an optimal number. If you wish to improve performance please see https://nanoporetech.github.io/medaka/installation.html#improving-parallelism.
[20:44:42 - Predict] Setting tensorflow inter/intra-op threads to 2/1.
[20:44:42 - Predict] Processing region(s): tig00000001:0-5405153 tig00000022:0-105621 tig00000043:0-7272 tig00000053:0-58645
[20:44:42 - Predict] Using model: /home/epi2melabs/conda/lib/python3.8/site-packages/medaka/data/r1041_e82_260bps_sup_g632_model.tar.gz.
[20:44:42 - Predict] Processing 9 long region(s) with batching.
[20:44:42 - MdlStrTF] Model <keras.engine.sequential.Sequential object at 0x7f3ab2b18970>
[20:44:42 - MdlStrTF] loading weights from /tmp/tmphlja2285/model/variables/variables
[20:44:42 - BAMFile] Creating pool of 16 BAM file sets.
[20:44:42 - Sampler] Initializing sampler for consensus of region tig00000001:0-1000000.
[20:44:42 - Sampler] Initializing sampler for consensus of region tig00000001:999000-1999000.
[20:44:42 - PWorker] Running inference for 5.6M draft bases.
[20:44:50 - Feature] Processed tig00000001:999000.0-1998999.1 (median depth 664.0)
[20:44:50 - Sampler] Took 7.64s to make features.
[20:44:50 - Feature] Processed tig00000001:0.0-999999.0 (median depth 622.0)
[20:44:50 - Sampler] Took 7.77s to make features.
[20:44:50 - Sampler] Initializing sampler for consensus of region tig00000001:1998000-2998000.
[20:44:51 - Sampler] Initializing sampler for consensus of region tig00000001:2997000-3997000.
[20:44:54 - Feature] Processed tig00000001:2997000.0-3996999.2 (median depth 526.0)
[20:44:54 - Sampler] Took 3.10s to make features.
[20:44:54 - Sampler] Initializing sampler for consensus of region tig00000001:3996000-4996000.
[20:44:54 - Feature] Processed tig00000001:1998000.0-2997999.2 (median depth 627.0)
[20:44:54 - Sampler] Took 3.69s to make features.
[20:44:54 - Sampler] Initializing sampler for consensus of region tig00000001:4995000-5405153.
[20:44:56 - Feature] Pileup counts do not span requested region, requested tig00000001:4995000-5405153, received 4995000-5405150.
[20:44:56 - Feature] Processed tig00000001:4995000.0-5405150.0 (median depth 550.0)
[20:44:56 - Sampler] Took 1.74s to make features.
[20:44:56 - Sampler] Initializing sampler for consensus of region tig00000022:0-105621.
[20:44:57 - Feature] Processed tig00000001:3996000.0-4995999.2 (median depth 509.0)
[20:44:57 - Sampler] Took 3.03s to make features.
[20:44:57 - Sampler] Initializing sampler for consensus of region tig00000043:0-7272.
[20:44:57 - Feature] Pileup counts do not span requested region, requested tig00000022:0-105621, received 0-105618.
[20:44:57 - Feature] Processed tig00000022:0.0-105618.0 (median depth 365.0)
[20:44:57 - Sampler] Took 1.02s to make features.
[20:44:57 - Sampler] Initializing sampler for consensus of region tig00000053:0-58645.
[20:44:58 - Feature] Processed tig00000053:0.0-58644.0 (median depth 610.0)
[20:44:58 - Sampler] Took 1.14s to make features.
[20:44:58 - Feature] Pileup counts do not span requested region, requested tig00000043:0-7272, received 0-7269.
[20:44:58 - Feature] Processed tig00000043:0.0-7269.0 (median depth 7227.0)
[20:44:58 - Sampler] Took 1.51s to make features.
[20:45:08 - PWorker] Batches in cache: 8.
[20:45:08 - PWorker] 8.2% Done (0.5/5.6 Mbases) in 26.1s
[20:45:20 - PWorker] Batches in cache: 8.
[20:45:20 - PWorker] 16.4% Done (0.9/5.6 Mbases) in 37.3s
[20:45:31 - PWorker] Batches in cache: 8.
[20:45:31 - PWorker] 23.8% Done (1.3/5.6 Mbases) in 48.3s
[20:45:42 - PWorker] Batches in cache: 8.
[20:45:42 - PWorker] 31.4% Done (1.8/5.6 Mbases) in 59.4s
[20:45:53 - PWorker] Batches in cache: 8.
[20:45:53 - PWorker] 39.5% Done (2.2/5.6 Mbases) in 70.5s
[20:46:04 - PWorker] Batches in cache: 8.
[20:46:04 - PWorker] 47.7% Done (2.7/5.6 Mbases) in 81.7s
[20:46:15 - PWorker] Batches in cache: 7.
[20:46:15 - PWorker] 56.4% Done (3.1/5.6 Mbases) in 92.9s
[20:46:26 - PWorker] Batches in cache: 6.
[20:46:26 - PWorker] 63.2% Done (3.5/5.6 Mbases) in 103.9s
[20:46:37 - PWorker] Batches in cache: 5.
[20:46:37 - PWorker] 71.3% Done (4.0/5.6 Mbases) in 115.0s
[20:46:48 - PWorker] Batches in cache: 4.
[20:46:48 - PWorker] 79.6% Done (4.4/5.6 Mbases) in 126.1s
[20:47:00 - PWorker] Batches in cache: 3.
[20:47:00 - PWorker] 88.1% Done (4.9/5.6 Mbases) in 137.3s
[20:47:11 - PWorker] Batches in cache: 2.
[20:47:11 - PWorker] 96.8% Done (5.4/5.6 Mbases) in 148.5s
[20:47:18 - PWorker] Batches in cache: 1.
[20:47:19 - PWorker] Processed 13 batches
[20:47:19 - PWorker] All done, 0 remainder regions.
[20:47:19 - Predict] Finished processing all regions.
[20:47:20 - DataIndx] Loaded 1/1 (100.00%) sample files.
[20:47:20 - DataIndx] Loaded 1/1 (100.00%) sample files.
[20:47:20 - DataIndx] Loaded 1/1 (100.00%) sample files.
[20:47:20 - DataIndx] Loaded 1/1 (100.00%) sample files.
[20:47:20 - DataIndx] Loaded 1/1 (100.00%) sample files.
[20:47:20 - DataIndx] Loaded 1/1 (100.00%) sample files.
[20:47:20 - DataIndx] Loaded 1/1 (100.00%) sample files.
[20:47:20 - DataIndx] Loaded 1/1 (100.00%) sample files.
[20:47:20 - DataIndx] Loaded 1/1 (100.00%) sample files.
[20:47:20 - DataIndx] Loaded 1/1 (100.00%) sample files.
/home/epi2melabs/conda/lib/python3.8/site-packages/medaka/labels.py:387: RuntimeWarning: divide by zero encountered in log10
  q = -10 * np.log10(err)
/home/epi2melabs/conda/lib/python3.8/site-packages/medaka/labels.py:387: RuntimeWarning: divide by zero encountered in log10
  q = -10 * np.log10(err)
/home/epi2melabs/conda/lib/python3.8/site-packages/medaka/labels.py:387: RuntimeWarning: divide by zero encountered in log10
  q = -10 * np.log10(err)
/home/epi2melabs/conda/lib/python3.8/site-packages/medaka/labels.py:387: RuntimeWarning: divide by zero encountered in log10
  q = -10 * np.log10(err)
/home/epi2melabs/conda/lib/python3.8/site-packages/medaka/labels.py:387: RuntimeWarning: divide by zero encountered in log10
  q = -10 * np.log10(err)
/home/epi2melabs/conda/lib/python3.8/site-packages/medaka/labels.py:387: RuntimeWarning: divide by zero encountered in log10
  q = -10 * np.log10(err)
/home/epi2melabs/conda/lib/python3.8/site-packages/medaka/labels.py:387: RuntimeWarning: divide by zero encountered in log10
  q = -10 * np.log10(err)
/home/epi2melabs/conda/lib/python3.8/site-packages/medaka/labels.py:387: RuntimeWarning: divide by zero encountered in log10
  q = -10 * np.log10(err)
Polished assembly written to /mnt/scratch2/NGS6811_consensus1/consensus.fasta, have a nice day.

Environment (if you do not have a GPU, write No GPU):

Installation method docker - used latest version
OS: Ubuntu LTS 22.04
medaka version 1.7.2
GPU RTX 3090
Nvidia driver 530.30.02
CUDA version N/A (12.1 on the host)
cuDNN version N/A

Additional context Add any other context about the problem here.

cjw85 commented 1 year ago

This is the correct behaviour, the help link needs updating however. The correct link is now: https://github.com/nanoporetech/medaka#improving-parallelism

znruss commented 1 year ago

Why's it saying I'm not using a GPU though? running nvidia-smi on this docker container shows that the RTX 3090 is visible in the container.

znruss commented 1 year ago

Also the divide by zeros feel like an incorrect behavior.

cjw85 commented 1 year ago

Over the years we've learnt its difficult to diagnose GPU errors remotely, so you might be on your own here. Something I would say however is that the speed of the processing would indicate that a GPU is being used. Can you watch the output of nvidia-smi while medaka is running and see if anything pops into existence?

The divide by zeros result from some niggly floating point precision issues. I'll see about fixing those.

znruss commented 1 year ago

No worries - CUDA errors are absolutely the most annoying things I have worked with. I ran nvidia-smi -lms 500 and grepped history as the program ran. Nothing showed up besides the desktop window manager. Thanks again for the insight on the divide by zero.

By way of troubleshooting, I ran: docker run --rm --gpus all ontresearch/medaka:latest python3 -c "import tensorflow as tf;print(tf.config.list_physical_devices('GPU'))" And got back [] Also tried docker run --rm --gpus all ontresearch/medaka:latest python3 -c "import tensorflow as tf;print(tf.sysconfig.get_build_info(),tf.__version__)" Got back OrderedDict([('is_cuda_build', False), ('is_rocm_build', False), ('is_tensorrt_build', False)]) 2.7.1

Whereas docker run --rm --gpus all nvcr.io/nvidia/tensorflow:23.03-tf2-py3 python3 -c "import tensorflow as tf;print(tf.config.list_physical_devices('GPU'))" Got back [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')] Also tried docker run --rm --gpus all nvcr.io/nvidia/tensorflow:23.03-tf2-py3 python3 -c "import tensorflow as tf;print(tf.sysconfig.get_build_info(),tf.__version__)" Got back OrderedDict([('cpu_compiler', '/opt/rh/devtoolset-9/root/usr/bin/gcc'), ('cuda_compute_capabilities', ['sm_52', 'sm_60', 'sm_61', 'sm_70', 'sm_75', 'sm_80', 'sm_86', 'compute_90']), ('cuda_version', '12.1'), ('cudnn_version', '8'), ('is_cuda_build', True), ('is_rocm_build', False), ('is_tensorrt_build', True)]) 2.11.0

I've tried finding some tensorflow builds that support the cuda I'm running and so far that's the only convenient one (even pip3 install tensorflow on bare metal gave 2.12 but not compiled for cuda). For what it's worth, building the container off of nvcr.io/nvidia/tensorflow:23.03-tf2-py3 may be useful to solve the problem. But CPU-only performance does seem pretty good. Thanks again for looking into it!

cjw85 commented 10 months ago

Running your examples above I get results that I would expect:

$ docker run --env TF_CPP_MIN_LOG_LEVEL=3 --rm --gpus all ontresearch/medaka:latest python3 -c "import tensorflow as tf;print(tf.config.list_physical_devices('GPU'))"
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU')]

$ docker run --env TF_CPP_MIN_LOG_LEVEL=3 --rm --gpus all ontresearch/medaka:latest python3 -c "import tensorflow as tf;print(tf.sysconfig.get_build_info(),tf.__version__)"
OrderedDict([('cpu_compiler', '/dt9/usr/bin/gcc'), ('cuda_compute_capabilities', ['sm_35', 'sm_50', 'sm_60', 'sm_70', 'sm_75', 'compute_80']), ('cuda_version', '11.2'), ('cudnn_version', '8'), ('is_cuda_build', True), ('is_rocm_build', False), ('is_tensorrt_build', True)]) 2.10.1

nanoporetech / medaka

Weird behavior on docker container / medaka_consensus #428