nanoporetech / medaka

Sequence correction provided by ONT Research
https://nanoporetech.com
Other
391 stars 73 forks source link

Failed to stitch consensus chunks. (KeyError: b'contig_1') #473

Closed DaHye0205 closed 7 months ago

DaHye0205 commented 7 months ago

Medaka is a Research Release.

Research releases are provided as technology demonstrators to provide early access to features or stimulate Community development of tools. Support for this software will be minimal and is only provided directly by the developers. Feature requests, improvements, and discussions are welcome and can be implemented by forking and pull requests. However much as we would like to rectify every issue and piece of feedback users may have, the developers may have limited resource for support of this software. Research releases may be unstable and subject to rapid iteration by Oxford Nanopore Technologies.

Please ensure that you are using the most recent version of medaka before filing a bug report. The most recent version can be found on the release page. If you are not using the most recent release, and file a issue regardless the most likely response from our developers will be to ask you to first upgrade.

Please ensure also to provide the information below, not doing so will likely result in a request for the information.

Describe the bug A clear and concise description of what the bug is including the command that you have run.

Logging Please attach any relevant logging messages. (Use ``` before and after code blocks).

Environment (if you do not have a GPU, write No GPU):

Additional context

I'm trying to polish the fasta file (created with racon polishing) with fastq.gz (obtained with Nanopore) once again with medaka, but I get this error.

(DAHYE3.7) user@ubuntu:~/anaconda3$ NPROC=$(nproc) (DAHYE3.7) user@ubuntu:~/anaconda3$ BASECALLS=barcode5_porechop.fastq.gz (DAHYE3.7) user@ubuntu:~/anaconda3$ DRAFT=racon4.fasta (DAHYE3.7) user@ubuntu:~/anaconda3$ OUTDIR=medaka_consensus (DAHYE3.7) user@ubuntu:~/anaconda3$ medaka_consensus -i ${BASECALLS} -d ${DRAFT} -o ${OUTDIR} -t ${NPROC} -m r941_min_high_g360 -t 16 –b 100 2023-11-09 09:15:39.517908: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-11-09 09:15:39.621357: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2023-11-09 09:15:41.522693: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-11-09 09:15:41.609747: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. Checking program versions This is medaka 1.2.2 Program Version Required Pass
bcftools 1.17 1.9 True
bgzip 1.17 1.9 True
minimap2 2.26 2.11 True
samtools 1.18 1.9 True
tabix 1.17 1.9 True
2023-11-09 09:15:43.794696: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-11-09 09:15:43.882267: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2023-11-09 09:15:45.688494: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-11-09 09:15:45.757568: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. Aligning basecalls to draft Removing previous index file /home/user/anaconda3/racon4.fasta.mmi Removing previous index file /home/user/anaconda3/racon4.fasta.fai Constructing minimap index. [M::mm_idx_gen::0.0851.03] collected minimizers [M::mm_idx_gen::0.1101.41] sorted minimizers [M::main::0.1421.32] loaded/built the index for 1 target sequence(s) [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1 [M::mm_idx_stat::0.1481.31] distinct minimizers: 459554 (97.25% are singletons); average occurrences: 1.038; average spacing: 5.365; total length: 2560154 [M::main] Version: 2.26-r1175 [M::main] CMD: minimap2 -I 16G -x map-ont --MD -d /home/user/anaconda3/racon4.fasta.mmi /home/user/anaconda3/racon4.fasta [M::main] Real time: 0.153 sec; CPU: 0.198 sec; Peak RSS: 0.029 GB [M::main::0.0371.06] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.0451.05] mid_occ = 10 [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1 [M::mm_idx_stat::0.0511.04] distinct minimizers: 459554 (97.25% are singletons); average occurrences: 1.038; average spacing: 5.365; total length: 2560154 [M::worker_pipeline::4.0237.60] mapped 16577 sequences [M::main] Version: 2.26-r1175 [M::main] CMD: minimap2 -x map-ont --MD -t 16 -a -A 2 -B 4 -O 4,24 -E 2,1 /home/user/anaconda3/racon4.fasta.mmi /home/user/anaconda3/barcode5_porechop.fastq.gz [M::main] Real time: 4.028 sec; CPU: 30.589 sec; Peak RSS: 0.465 GB [bam_sort_core] merging from 0 files and 16 in-memory blocks... Running medaka consensus 2023-11-09 09:15:53.067283: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-11-09 09:15:53.134948: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. [09:15:54 - Predict] Reducing threads to 2, anymore is a waste. [09:15:54 - Predict] Setting tensorflow inter/intra-op threads to 2/1. [09:15:54 - Predict] Processing region(s): contig_1:0-2560154 [09:15:54 - Predict] Using model: /home/user/anaconda3/envs/DAHYE3.7/lib/python3.7/site-packages/medaka/data/r941_min_high_g360_model.hdf5. [09:15:54 - Predict] Found a GPU. [09:15:54 - Predict] If cuDNN errors are observed, try setting the environment variable TF_FORCE_GPU_ALLOW_GROWTH=true. To explicitely disable use of cuDNN use the commandline option --disable_cudnn. If OOM (out of memory) errors are found please reduce batch size. [09:15:54 - Predict] Processing 3 long region(s) with batching. [09:15:54 - ModelStore] filepath /home/user/anaconda3/envs/DAHYE3.7/lib/python3.7/site-packages/medaka/data/r941_min_high_g360_model.hdf5 [09:15:54 - ModelLoad] GPU available: building model with cudnn optimization 2023-11-09 09:15:54.160020: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-11-09 09:15:54.223251: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14393 MB memory: -> device: 0, name: NVIDIA RTX A4000, pci bus id: 0000:17:00.0, compute capability: 8.6 [09:15:54 - DLoader] Initializing data loader [09:15:54 - Sampler] Initializing sampler for consensus of region contig_1:0-1000000. [09:15:54 - PWorker] Running inference for 2.6M draft bases. [09:15:54 - Sampler] Initializing sampler for consensus of region contig_1:999000-1999000. [09:15:54 - Sampler] Initializing sampler for consensus of region contig_1:1998000-2560154. [09:15:55 - Feature] Processed contig_1:1998000.0-2560153.0 (median depth 36.0) [09:15:55 - Sampler] Took 0.59s to make features. [09:15:55 - Feature] Processed contig_1:0.0-999999.2 (median depth 38.0) [09:15:55 - Sampler] Took 0.79s to make features. [09:15:55 - Feature] Processed contig_1:999000.0-1998999.0 (median depth 35.0) [09:15:55 - Sampler] Took 0.93s to make features. 2023-11-09 09:15:57.087766: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8800 2023-11-09 09:15:57.522387: I tensorflow/stream_executor/cuda/cuda_blas.cc:1614] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once. [09:16:00 - PWorker] All done, 0 remainder regions. [09:16:00 - Predict] Finished processing all regions. 2023-11-09 09:16:01.555517: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-11-09 09:16:01.658096: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variableTF_ENABLE_ONEDNN_OPTS=0`. Traceback (most recent call last): File "/home/user/anaconda3/envs/DAHYE3.7/bin/medaka", line 11, in sys.exit(main()) File "/home/user/anaconda3/envs/DAHYE3.7/lib/python3.7/site-packages/medaka/medaka.py", line 684, in main args.func(args) File "/home/user/anaconda3/envs/DAHYE3.7/lib/python3.7/site-packages/medaka/stitch.py", line 185, in stitch contigs, gt = fill_gaps(contigs, args.draft) File "/home/user/anaconda3/envs/DAHYE3.7/lib/python3.7/site-packages/medaka/stitch.py", line 121, in fill_gaps (ref_name, 0, contig_lengths[ref_name]), pieces)) KeyError: b'contig_1' Failed to stitch consensus chunks.

plz help...........

Add any other context about the problem here.

cjw85 commented 7 months ago

The log suggests you are using quite an old version of medaka. Please update to the latest version. Note also that the bioconda packages are not supported by Oxford Nanopore Technologies.

DaHye0205 commented 7 months ago

I updated using 'conda update --all', and 'conda update medaka' but the version of medaka does not change. Do I need to install it using a method other than conda to use the latest version?

conda version: 23.10.0

$ conda update medaka
Channels:

All requested packages already installed.

medaka version: 1.2.2

cjw85 commented 7 months ago

Please see the README for supported methods of installation. For advice with conda please raise an issue on the bioconda channels.