nanoporetech / medaka

Sequence correction provided by ONT Research
https://nanoporetech.com
Other
391 stars 73 forks source link

Bad heap free list error in medaka stitch #485

Closed SergeWielhouwer closed 6 months ago

SergeWielhouwer commented 6 months ago

Medaka is a Research Release.

Research releases are provided as technology demonstrators to provide early access to features or stimulate Community development of tools. Support for this software will be minimal and is only provided directly by the developers. Feature requests, improvements, and discussions are welcome and can be implemented by forking and pull requests. However much as we would like to rectify every issue and piece of feedback users may have, the developers may have limited resource for support of this software. Research releases may be unstable and subject to rapid iteration by Oxford Nanopore Technologies.

Please ensure that you are using the most recent version of medaka before filing a bug report. The most recent version can be found on the release page. If you are not using the most recent release, and file a issue regardless the most likely response from our developers will be to ask you to first upgrade.

Please ensure also to provide the information below, not doing so will likely result in a request for the information.

Describe the bug I am trying to polish a Spodoptera frugiperda genome assembly from Flye v2.9.2 through medaka with the command medaka_consensus -i filtered_long_reads/105828-001-002_long.fastq.gz -d assembly/105828-001-002/flye/assembly.fasta \ -o assembly/105828-001-002/medaka_polished/ -t 32 -m r1041_e82_400bps_sup_v4.2.0 2>logs/medaka.105828-001-002.log on a HPC cluster with SLURM job manager (200 GB ram reserved for job), but I encounter issues during the final stitching step (see below). I already tried restarting the tool after removing all medaka output files. Logging Please attach any relevant logging messages. (Use ``` before and after code blocks).

From medaka.105828-001-002.log ``` Cannot import pyabpoa, some features may not be available. Cannot import pyabpoa, some features may not be available. Cannot import pyabpoa, some features may not be available. Cannot import pyabpoa, some features may not be available. Cannot import pyabpoa, some features may not be available. [12:10:09 - MdlStrTF] Successfully removed temporary files from /tmp/tmpsqp_qpv5. Cannot import pyabpoa, some features may not be available. [12:10:09 - MdlStrTF] Successfully removed temporary files from /tmp/tmpp03qnd0k. Cannot import pyabpoa, some features may not be available. [12:10:10 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:11 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:11 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:11 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:11 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:11 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:11 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:11 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:11 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:11 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:11 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:11 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:11 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:11 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:11 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:11 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:11 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:11 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:11 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:11 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:11 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:11 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:11 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:11 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:11 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:11 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:11 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:11 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:11 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:11 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:11 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:11 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:11 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:12 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:12 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:12 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:12 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:12 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:12 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:12 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:12 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:12 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:12 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:12 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:12 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:12 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:12 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:12 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:12 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:12 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:12 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:12 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:12 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:12 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:12 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:12 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:12 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:12 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:12 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:12 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:12 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:12 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:12 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:12 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:12 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:12 - DataIndx] Loaded 1/1 (100.00%) sample files. [12:10:12 - DataIndx] Loaded 1/1 (100.00%) sample files. concurrent.futures.process._RemoteTraceback: """ Traceback (most recent call last): File "/home/epi2melabs/conda/lib/python3.8/concurrent/futures/process.py", line 239, in _process_worker r = call_item.fn(*call_item.args, *call_item.kwargs) File "/home/epi2melabs/conda/lib/python3.8/concurrent/futures/process.py", line 198, in _process_chunk return [fn(args) for args in chunk] File "/home/epi2melabs/conda/lib/python3.8/concurrent/futures/process.py", line 198, in return [fn(*args) for args in chunk] File "/home/epi2melabs/conda/lib/python3.8/site-packages/medaka/stitch.py", line 106, in stitch_from_probs return _stitch_samples(samples, label_scheme, region, min_depth) File "/home/epi2melabs/conda/lib/python3.8/site-packages/medaka/stitch.py", line 60, in _stitch_samples for s, is_last_in_contig, heuristic in data_gen: File "/home/epi2melabs/conda/lib/python3.8/site-packages/medaka/common.py", line 543, in trim_samples_to_region yield from samples File "/home/epi2melabs/conda/lib/python3.8/site-packages/medaka/common.py", line 529, in _trim_ends for sample, last, heuristic in samples: File "/home/epi2melabs/conda/lib/python3.8/site-packages/medaka/common.py", line 513, in _trim_starts for sample, last, heuristic in samples: File "/home/epi2melabs/conda/lib/python3.8/site-packages/medaka/common.py", line 443, in trim_samples s1 = next(sample_gen) File "/home/epi2melabs/conda/lib/python3.8/site-packages/medaka/datastore.py", line 557, in yield_from_feature_files yield self._ds.load_sample(key) File "/home/epi2melabs/conda/lib/python3.8/site-packages/medaka/datastore.py", line 351, in load_sample group = self.fh['{}/{}'.format(self._samplepath, key)] File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "/home/epi2melabs/conda/lib/python3.8/site-packages/h5py/_hl/group.py", line 357, in getitem oid = h5o.open(self.id, self._e(name), lapl=self._lapl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5o.pyx", line 189, in h5py.h5o.open KeyError: 'Unable to synchronously open object (bad heap free list)' """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/epi2melabs/conda/bin/medaka", line 8, in sys.exit(main()) File "/home/epi2melabs/conda/lib/python3.8/site-packages/medaka/medaka.py", line 814, in main args.func(args) File "/home/epi2melabs/conda/lib/python3.8/site-packages/medaka/stitch.py", line 265, in stitch contigs, gt = fill_gaps(contigs, args.draft, args.fill_char) File "/home/epi2melabs/conda/lib/python3.8/site-packages/medaka/stitch.py", line 127, in fill_gaps for info, sequence_parts, qualities in contigs: File "/home/epi2melabs/conda/lib/python3.8/site-packages/medaka/stitch.py", line 175, in collapse_neighbours contig = next(contigs) File "/home/epi2melabs/conda/lib/python3.8/site-packages/medaka/stitch.py", line 243, in stitch_regions_parallel yield from pieces File "/home/epi2melabs/conda/lib/python3.8/concurrent/futures/process.py", line 484, in _chain_from_iterable_of_lists for element in iterable: File "/home/epi2melabs/conda/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator yield fs.pop().result() File "/home/epi2melabs/conda/lib/python3.8/concurrent/futures/_base.py", line 444, in result return self.get_result() File "/home/epi2melabs/conda/lib/python3.8/concurrent/futures/_base.py", line 389, in get_result raise self._exception KeyError: 'Unable to synchronously open object (bad heap free list)' ```

From SLURM stdout log ``` TF_CPP_MIN_LOG_LEVEL is set to '3' Checking program versions This is medaka 1.11.1 Program Version Required Pass
bcftools 1.18 1.11 True
bgzip 1.18 1.11 True
minimap2 2.26 2.11 True
samtools 1.18 1.11 True
tabix 1.18 1.11 True
WARNING: Output assembly/105828-001-002/medaka_polished/ already exists, may use old results. Not aligning basecalls to draft, calls_to_draft.bam exists. Not running medaka consensus, consensus_probs.hdf exists. Failed to stitch consensus chunks. ```

Environment (if you do not have a GPU, write No GPU):

Additional context Two other samples successfully managed to complete all medaka steps, however these samples had 4035 and 3443 contigs to start from, while this sample is quite fragmented with 30756 contigs due to low genomic coverage (5-6X). The overall polishing took quite a bit longer (>2 days) than the other two samples.

cjw85 commented 6 months ago

We've seen errors like this in the medaka stitch process previously with users, the cause has always been that the intermediate HDF files produced by medaka consensus have become currupt. We've never managed to isolate the problem ourselves and reproduce it.

Unrelated to your errors, the low coverage is likely to mean that the results output from medaka are unstable; I would advise 20X as an absolute minimum and preferable at least 30-40X.

SergeWielhouwer commented 6 months ago

Thank you, I will try to run it again to see if this error occurs again. Otherwise, I will consider skipping medaka overall and directly do short read polishing on the assembly.