Medaka freezing - Githubissues

lucyintheskyzzz commented 1 year ago

Hi,

I am running Medaka 1.7.1 on my Canu assembled fasta files and for some reason one of my samples will not finish the medaka run and always freezes. The sample that ran through to finish (barcode01) has a file size of 5,461 and barcode02 (the sample that freezes) has a file size of 20,781. Do you know why barcode02 freezes?

Here is the run: [kvigil@cypress01-124 ~]$ module load anaconda3/2020.07 [kvigil@cypress01-124 ~]$ source activate medaka (/lustre/project/taw/share/conda-envs/medaka) [kvigil@cypress01-124 ~]$ BASECALLS=/lustre/project/taw/ONR060922/fastq_pass/fastq_concat/barcode02.fastq (/lustre/project/taw/share/conda-envs/medaka) [kvigil@cypress01-124 ~]$ DRAFT=/lustre/project/taw/ONR060922/canu/barcode02/barcode02.contigs.fasta (/lustre/project/taw/share/conda-envs/medaka) [kvigil@cypress01-124 ~]$ OUTDIR=/lustre/project/taw/ONR060922/canu/medaka/barcode02 (/lustre/project/taw/share/conda-envs/medaka) [kvigil@cypress01-124 ~]$ NPROC=20 (/lustre/project/taw/share/conda-envs/medaka) [kvigil@cypress01-124 ~]$ medaka_consensus -i ${BASECALLS} -d ${DRAFT} -o ${OUTDIR} -t ${NPROC} -m r941_min_high_g330 Checking program versions This is medaka 1.7.1 Program Version Required Pass bcftools 1.15.1 1.11 True bgzip 1.15.1 1.11 True minimap2 2.24 2.11 True samtools 1.15.1 1.11 True tabix 1.15.1 1.11 True Aligning basecalls to draft Using the existing fai index file /lustre/project/taw/ONR060922/canu/barcode02/barcode02.contigs.fasta.fai Using the existing mmi index file /lustre/project/taw/ONR060922/canu/barcode02/barcode02.contigs.fasta.map-ont.mmi [M::main::0.0710.08] loaded/built the index for 11 target sequence(s) [M::mm_mapopt_update::0.0710.08] mid_occ = 2251 [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 11 [M::mm_idx_stat::0.071*0.08] distinct minimizers: 2195 (93.62% are singletons); average occurrences: 2.291; average spacing: 3.856; total length: 19386

it always freezes at this part.

Thanks!

lucyintheskyzzz commented 1 year ago

Hello?

cjw85 commented 1 year ago

Hi Lucy, I believe I replied to you on the Nanopore Community.

When you say it freezes, for how long does it freeze? The above is all output from minimap2, not medaka itself: at this point in the script minimap2 is being used to align your reads to your canu assembly before running medaka's consensus program.

lucyintheskyzzz commented 1 year ago

It just freezes and stops here and nothing gets generated in the folder that was created.

cjw85 commented 1 year ago

I would install minimap2 in a distinct conda environment and trying running it in isolation. Please refer to the minimap2 documentation for how to do this.

If you observe similar behaviour it is worth reporting the issue on the minimap2 issue page.

lucyintheskyzzz commented 1 year ago

I got another barcode to work where the file size is only 12,709 bytes.

So far the .fasta files that work seem to have <20k bytes. barcode01- 5,461 bytes (produced a file) barcode02-20,781 bytes (did not produce a file) barcode03-12,709 bytes (produced a file)

It looks like medaka can only polish my smaller files. I am going to try to run all my barcodes and see what happens.
What should my output file say on Ubuntu when the polishing is finished? Is there an ending prompt that says "all files have been polished"?

cjw85 commented 1 year ago

Here is the full output of running the medaka_consensus convenience script on a small dataset:

$ medaka_consensus -i fastq/barcode01/subset.fastq.gz -d ref/reference.subseq.fa.gz -o medaka -t 48
Checking program versions
This is medaka 1.7.2
Program    Version    Required   Pass
bcftools   1.14       1.11       True
bgzip      1.14       1.11       True
minimap2   2.17       2.11       True
samtools   1.14       1.11       True
tabix      1.14       1.11       True
Aligning basecalls to draft
Constructing minimap index.
[M::mm_idx_gen::0.011*0.93] collected minimizers
[M::mm_idx_gen::0.016*1.48] sorted minimizers
[M::main::0.019*1.39] loaded/built the index for 1 target sequence(s)
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1
[M::mm_idx_stat::0.020*1.38] distinct minimizers: 18842 (99.76% are singletons); average occurrences: 1.005; average spacing: 5.347
[M::main] Version: 2.17-r941
[M::main] CMD: minimap2 -I 16G -x map-ont --MD -d <path stem redacted>/ref/reference.subseq.fa.gz.mmi <path stem redacted>/ref/reference.subseq.fa.gz
[M::main] Real time: 0.027 sec; CPU: 0.030 sec; Peak RSS: 0.006 GB
[M::main::0.009*1.08] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.010*1.08] mid_occ = 6
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1
[M::mm_idx_stat::0.010*1.07] distinct minimizers: 18842 (99.76% are singletons); average occurrences: 1.005; average spacing: 5.347
[M::worker_pipeline::0.753*15.66] mapped 5993 sequences
[M::main] Version: 2.17-r941
[M::main] CMD: minimap2 -x map-ont --MD -t 48 -a -A 2 -B 4 -O 4,24 -E 2,1 <path stem redacted>ref/reference.subseq.fa.gz.mmi <path stem redacted>fastq/barcode01/subset.fastq.gz
[M::main] Real time: 0.761 sec; CPU: 11.797 sec; Peak RSS: 0.334 GB
[bam_sort_core] merging from 0 files and 48 in-memory blocks...
Running medaka consensus
[13:32:00 - Predict] Reducing threads to 2, anymore is a waste.
[13:32:00 - Predict] Setting tensorflow inter/intra-op threads to 2/1.
[13:32:00 - Predict] Processing region(s): lmonocytogenes:0-101224
[13:32:00 - Predict] Using model: <path stem redacted>medaka/data/r941_min_hac_g507_model.tar.gz.
[13:32:00 - Predict] Found a GPU.
[13:32:00 - Predict] If cuDNN errors are observed, try setting the environment variable `TF_FORCE_GPU_ALLOW_GROWTH=true`. To explicitely disable use of cuDNN use the commandline option `--disable_cudnn. If OOM (out of memory) errors are found please reduce batch size.
[13:32:00 - Predict] Processing 1 long region(s) with batching.
[13:32:00 - ModelLoad] GPU available: building model with cudnn optimization
[13:32:05 - MdlStrTF] Model <keras.engine.sequential.Sequential object at 0x7fa1c325ef40>
[13:32:05 - MdlStrTF] loading weights from /tmp/tmp8d77k64q/model/variables/variables
[13:32:05 - BAMFile] Creating pool of 16 BAM file sets.
[13:32:05 - Sampler] Initializing sampler for consensus of region lmonocytogenes:0-101224.
[13:32:05 - PWorker] Running inference for 0.1M draft bases.
[13:32:06 - Feature] Pileup counts do not span requested region, requested lmonocytogenes:0-101224, received 3-101223.
[13:32:06 - Feature] Processed lmonocytogenes:3.0-101223.0 (median depth 200.0)
[13:32:06 - Sampler] Took 1.05s to make features.
[13:32:10 - PWorker] Batches in cache: 1.
[13:32:10 - PWorker] Processed 1 batches
[13:32:10 - PWorker] All done, 0 remainder regions.
[13:32:10 - Predict] Finished processing all regions.
[13:32:16 - DataIndx] Loaded 1/1 (100.00%) sample files.
[13:32:16 - DataIndx] Loaded 1/1 (100.00%) sample files.
Polished assembly written to medaka/consensus.fasta, have a nice day.

The input files used here are: subset.fastq.gz reference.subseq.fa.gz

lucyintheskyzzz commented 1 year ago

Thanks! Looks like the r941_min_fast_g303 worked for the barcode03 bigger file!

(/lustre/project/taw/share/conda-envs/medaka) [kvigil@cypress2 ~]$ BASECALLS=/lustre/project/taw/ONR060922/fastq_pass/fastq_concat/barcode02.fastq
(/lustre/project/taw/share/conda-envs/medaka) [kvigil@cypress2 ~]$ DRAFT=/lustre/project/taw/ONR060922/canu/barcode02/barcode02.contigs.fasta
(/lustre/project/taw/share/conda-envs/medaka) [kvigil@cypress2 ~]$ OUTDIR=/lustre/project/taw/ONR060922/canu/medaka/barcode02
(/lustre/project/taw/share/conda-envs/medaka) [kvigil@cypress2 ~]$ NPROC=20
(/lustre/project/taw/share/conda-envs/medaka) [kvigil@cypress2 ~]$ medaka_consensus -i ${BASECALLS} -d ${DRAFT} -o ${OUTDIR} -t ${NPROC} -m r941_min_fast_g303
Checking program versions
This is medaka 1.7.1
Program    Version    Required   Pass
bcftools   1.15.1     1.11       True
bgzip      1.15.1     1.11       True
minimap2   2.24       2.11       True
samtools   1.15.1     1.11       True
tabix      1.15.1     1.11       True
Aligning basecalls to draft
Using the existing fai index file /lustre/project/taw/ONR060922/canu/barcode02/barcode02.contigs.fasta.fai
Using the existing mmi index file /lustre/project/taw/ONR060922/canu/barcode02/barcode02.contigs.fasta.map-ont.mmi
[M::main::0.036*0.14] loaded/built the index for 11 target sequence(s)
[M::mm_mapopt_update::0.036*0.14] mid_occ = 2251
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 11
[M::mm_idx_stat::0.036*0.17] distinct minimizers: 2195 (93.62% are singletons); average occurrences: 2.291; average spacing: 3.856; total length: 19386
[M::worker_pipeline::5347.342*19.96] mapped 116000 sequences
[M::main] Version: 2.24-r1122
[M::main] CMD: minimap2 -x map-ont --secondary=no -L --MD -A 2 -B 4 -O 4,24 -E 2,1 -t 20 -a /lustre/project/taw/ONR060922/canu/barcode02/barcode02.contigs.fasta.map-ont.mmi /lustre/project/taw/ONR060922/fastq_pass/fastq_concat/barcode02.fastq
[M::main] Real time: 5347.740 sec; CPU: 106718.482 sec; Peak RSS: 5.124 GB
[bam_sort_core] merging from 0 files and 20 in-memory blocks...
Running medaka consensus
[10:15:55 - Predict] Reducing threads to 2, anymore is a waste.
[10:15:55 - Predict] It looks like you are running medaka without a GPU and attempted to set a high number of threads. We have scaled this down to an optimal number. If you wish to improve performance please see https://nanoporetech.github.io/medaka/installation.html#improving-parallelism.
[10:15:55 - Predict] Setting tensorflow inter/intra-op threads to 2/1.
[10:15:55 - Predict] Processing region(s): tig00000001:0-4625 tig00000002:0-2970 tig00000003:0-1124 tig00000004:0-1182 tig00000005:0-1316 tig00000006:0-1320 tig00000007:0-1136 tig00000008:0-1078 tig00000009:0-1155 tig00000010:0-2145 tig00000011:0-1335
[10:15:55 - Predict] Using model: /lustre/project/taw/share/conda-envs/medaka/lib/python3.8/site-packages/medaka/data/r941_min_fast_g303_model.hdf5.
[10:15:55 - Predict] Processing 11 long region(s) with batching.
[10:15:55 - MdlStore] filepath /lustre/project/taw/share/conda-envs/medaka/lib/python3.8/site-packages/medaka/data/r941_min_fast_g303_model.hdf5
[10:15:57 - BAMFile] Creating pool of 16 BAM file sets.
[10:15:57 - Sampler] Initializing sampler for consensus of region tig00000001:0-4625.
[10:15:57 - Sampler] Initializing sampler for consensus of region tig00000002:0-2970.
[10:15:57 - PWorker] Running inference for 0.0M draft bases.
[10:15:57 - Feature] Processed tig00000002:0.0-2969.0 (median depth 43.0)
[10:15:57 - Sampler] Took 0.28s to make features.
[10:15:57 - Sampler] Region tig00000002:0.0-2969.0 (5940 positions) is smaller than inference chunk length 10000, quarantining.
[10:15:57 - Sampler] Initializing sampler for consensus of region tig00000003:0-1124.
[10:15:57 - Feature] Processed tig00000003:0.0-1123.0 (median depth 59.0)
[10:15:57 - Sampler] Took 0.28s to make features.
[10:15:57 - Sampler] Region tig00000003:0.0-1123.0 (3595 positions) is smaller than inference chunk length 10000, quarantining.
[10:15:57 - Sampler] Initializing sampler for consensus of region tig00000004:0-1182.
[10:15:58 - Feature] Processed tig00000004:0.0-1181.0 (median depth 310.0)
[10:15:58 - Sampler] Took 0.23s to make features.
[10:15:58 - Sampler] Region tig00000004:0.0-1181.0 (6539 positions) is smaller than inference chunk length 10000, quarantining.
[10:15:58 - Sampler] Initializing sampler for consensus of region tig00000005:0-1316.
[10:15:58 - Feature] Processed tig00000001:0.0-4624.0 (median depth 914.0)
[10:15:58 - Sampler] Took 1.05s to make features.
[10:15:58 - Sampler] Initializing sampler for consensus of region tig00000006:0-1320.
[10:15:58 - Feature] Processed tig00000005:0.0-1315.0 (median depth 150.0)
[10:15:58 - Sampler] Took 0.23s to make features.
[10:15:58 - Sampler] Region tig00000005:0.0-1315.0 (4109 positions) is smaller than inference chunk length 10000, quarantining.
[10:15:58 - Sampler] Initializing sampler for consensus of region tig00000007:0-1136.
[10:15:58 - Feature] Processed tig00000006:0.0-1319.0 (median depth 17.0)
[10:15:58 - Sampler] Took 0.27s to make features.
[10:15:58 - Sampler] Region tig00000006:0.0-1319.0 (2726 positions) is smaller than inference chunk length 10000, quarantining.
[10:15:58 - Sampler] Initializing sampler for consensus of region tig00000008:0-1078.
[10:15:58 - Feature] Processed tig00000007:0.0-1135.0 (median depth 20.0)
[10:15:58 - Sampler] Took 0.29s to make features.
[10:15:58 - Sampler] Region tig00000007:0.0-1135.0 (2300 positions) is smaller than inference chunk length 10000, quarantining.
[10:15:58 - Sampler] Initializing sampler for consensus of region tig00000009:0-1155.
[10:15:58 - Feature] Processed tig00000008:0.0-1077.0 (median depth 53.0)
[10:15:58 - Sampler] Took 0.18s to make features.
[10:15:58 - Sampler] Region tig00000008:0.0-1077.0 (2608 positions) is smaller than inference chunk length 10000, quarantining.
[10:15:58 - Sampler] Initializing sampler for consensus of region tig00000010:0-2145.
[10:15:58 - Feature] Processed tig00000009:0.0-1154.0 (median depth 40.0)
[10:15:58 - Sampler] Took 0.18s to make features.
[10:15:58 - Sampler] Region tig00000009:0.0-1154.0 (2401 positions) is smaller than inference chunk length 10000, quarantining.
[10:15:58 - Sampler] Initializing sampler for consensus of region tig00000011:0-1335.
[10:15:59 - Feature] Processed tig00000011:0.0-1334.0 (median depth 100.0)
[10:15:59 - Sampler] Took 0.31s to make features.
[10:15:59 - Sampler] Region tig00000011:0.0-1334.0 (3729 positions) is smaller than inference chunk length 10000, quarantining.
[10:15:59 - Feature] Processed tig00000010:0.0-2144.0 (median depth 252.0)
[10:15:59 - Sampler] Took 0.76s to make features.
[10:15:59 - Sampler] Region tig00000010:0.0-2144.0 (7302 positions) is smaller than inference chunk length 10000, quarantining.
[10:16:04 - PWorker] Batches in cache: 1.
[10:16:04 - PWorker] Processed 1 batches
[10:16:04 - PWorker] All done, 10 remainder regions.
[10:16:04 - Predict] Processing 10 short region(s).
[10:16:04 - MdlStore] filepath /lustre/project/taw/share/conda-envs/medaka/lib/python3.8/site-packages/medaka/data/r941_min_fast_g303_model.hdf5
[10:16:05 - Sampler] Initializing sampler for consensus of region tig00000002:0-2970.
[10:16:05 - Sampler] Initializing sampler for consensus of region tig00000003:0-1124.
[10:16:05 - PWorker] Running inference for 0.0M draft bases.
[10:16:05 - Feature] Processed tig00000002:0.0-2969.0 (median depth 43.0)
[10:16:05 - Sampler] Took 0.22s to make features.
[10:16:05 - Sampler] Initializing sampler for consensus of region tig00000004:0-1182.
[10:16:05 - Feature] Processed tig00000003:0.0-1123.0 (median depth 59.0)
[10:16:05 - Sampler] Took 0.24s to make features.
[10:16:05 - Sampler] Initializing sampler for consensus of region tig00000005:0-1316.
[10:16:05 - Feature] Processed tig00000005:0.0-1315.0 (median depth 150.0)
[10:16:05 - Sampler] Took 0.18s to make features.
[10:16:05 - Sampler] Initializing sampler for consensus of region tig00000006:0-1320.
[10:16:05 - Feature] Processed tig00000004:0.0-1181.0 (median depth 310.0)
[10:16:05 - Sampler] Took 0.21s to make features.
[10:16:05 - Sampler] Initializing sampler for consensus of region tig00000007:0-1136.
[10:16:06 - Feature] Processed tig00000006:0.0-1319.0 (median depth 17.0)
[10:16:06 - Sampler] Took 0.14s to make features.
[10:16:06 - Sampler] Initializing sampler for consensus of region tig00000008:0-1078.
[10:16:06 - Feature] Processed tig00000007:0.0-1135.0 (median depth 20.0)
[10:16:06 - Sampler] Took 0.19s to make features.
[10:16:06 - Sampler] Initializing sampler for consensus of region tig00000009:0-1155.
[10:16:06 - Feature] Processed tig00000008:0.0-1077.0 (median depth 53.0)
[10:16:06 - Sampler] Took 0.13s to make features.
[10:16:06 - Sampler] Initializing sampler for consensus of region tig00000011:0-1335.
[10:16:06 - Feature] Processed tig00000009:0.0-1154.0 (median depth 40.0)
[10:16:06 - Sampler] Took 0.12s to make features.
[10:16:06 - Sampler] Initializing sampler for consensus of region tig00000010:0-2145.
[10:16:06 - Feature] Processed tig00000011:0.0-1334.0 (median depth 100.0)
[10:16:06 - Sampler] Took 0.17s to make features.
[10:16:06 - Feature] Processed tig00000010:0.0-2144.0 (median depth 252.0)
[10:16:06 - Sampler] Took 0.64s to make features.
[10:16:27 - PWorker] Batches in cache: 8.
[10:16:27 - PWorker] 16.7% Done (0.0/0.0 Mbases) in 21.6s
[10:16:38 - PWorker] Batches in cache: 8.
[10:16:38 - PWorker] 23.7% Done (0.0/0.0 Mbases) in 33.5s
[10:16:52 - PWorker] Batches in cache: 8.
[10:16:52 - PWorker] 30.2% Done (0.0/0.0 Mbases) in 47.0s
[10:17:14 - PWorker] Batches in cache: 7.
[10:17:14 - PWorker] 37.3% Done (0.0/0.0 Mbases) in 68.6s
[10:17:23 - PWorker] Batches in cache: 6.
[10:17:31 - PWorker] Batches in cache: 5.
[10:17:31 - PWorker] 48.4% Done (0.0/0.0 Mbases) in 85.6s
[10:17:39 - PWorker] Batches in cache: 4.
[10:17:47 - PWorker] Batches in cache: 3.
[10:17:47 - PWorker] 58.6% Done (0.0/0.0 Mbases) in 102.2s
[10:18:00 - PWorker] Batches in cache: 2.
[10:18:00 - PWorker] 66.0% Done (0.0/0.0 Mbases) in 115.1s
[10:18:24 - PWorker] Batches in cache: 1.
[10:18:24 - PWorker] 78.7% Done (0.0/0.0 Mbases) in 138.8s
[10:18:24 - PWorker] Processed 10 batches
[10:18:24 - PWorker] All done, 0 remainder regions.
[10:18:24 - Predict] Finished processing all regions.
[10:18:32 - DataIndx] Loaded 1/1 (100.00%) sample files.
[10:18:32 - DataIndx] Loaded 1/1 (100.00%) sample files.
[10:18:32 - DataIndx] Loaded 1/1 (100.00%) sample files.
[10:18:32 - DataIndx] Loaded 1/1 (100.00%) sample files.
[10:18:32 - DataIndx] Loaded 1/1 (100.00%) sample files.
[10:18:32 - DataIndx] Loaded 1/1 (100.00%) sample files.
[10:18:32 - DataIndx] Loaded 1/1 (100.00%) sample files.
[10:18:32 - DataIndx] Loaded 1/1 (100.00%) sample files.
[10:18:32 - DataIndx] Loaded 1/1 (100.00%) sample files.
[10:18:32 - DataIndx] Loaded 1/1 (100.00%) sample files.
[10:18:32 - DataIndx] Loaded 1/1 (100.00%) sample files.
[10:18:32 - DataIndx] Loaded 1/1 (100.00%) sample files.
/lustre/project/taw/share/conda-envs/medaka/lib/python3.8/site-packages/medaka/labels.py:387: RuntimeWarning: divide by zero encountered in log10
  q = -10 * np.log10(err)
/lustre/project/taw/share/conda-envs/medaka/lib/python3.8/site-packages/medaka/labels.py:387: RuntimeWarning: divide by zero encountered in log10
  q = -10 * np.log10(err)
/lustre/project/taw/share/conda-envs/medaka/lib/python3.8/site-packages/medaka/labels.py:387: RuntimeWarning: divide by zero encountered in log10
  q = -10 * np.log10(err)
/lustre/project/taw/share/conda-envs/medaka/lib/python3.8/site-packages/medaka/labels.py:387: RuntimeWarning: divide by zero encountered in log10
  q = -10 * np.log10(err)
/lustre/project/taw/share/conda-envs/medaka/lib/python3.8/site-packages/medaka/labels.py:387: RuntimeWarning: divide by zero encountered in log10
  q = -10 * np.log10(err)
/lustre/project/taw/share/conda-envs/medaka/lib/python3.8/site-packages/medaka/labels.py:387: RuntimeWarning: divide by zero encountered in log10
  q = -10 * np.log10(err)
/lustre/project/taw/share/conda-envs/medaka/lib/python3.8/site-packages/medaka/labels.py:387: RuntimeWarning: divide by zero encountered in log10
  q = -10 * np.log10(err)
Polished assembly written to /lustre/project/taw/ONR060922/canu/medaka/barcode02/consensus.fasta, have a nice day.

nanoporetech / medaka

Medaka freezing #390