nanoporetech / medaka

Sequence correction provided by ONT Research
https://nanoporetech.com
Other
391 stars 73 forks source link

medaka stitch throws an error about `label_scheme` #477

Closed jd3234 closed 7 months ago

jd3234 commented 7 months ago

Describe the bug Running medaka stitch results in an error about label_scheme. I used the following command

medaka stitch medaka/consensus_probs.hdf assembly.fasta consensus.fasta

The unpolished assembly looks ok and comparable to other assemblies of the same species. medaka consensus finished successfully. I have not seen this before and other runs finished just fine with the exact same medaka installation.

Logging

[13:53:58 - DataIndx] Loaded 1/1 (100.00%) sample files.
Traceback (most recent call last):
  File "XXX/env/bin/medaka", line 8, in <module>
    sys.exit(main())
  File "XXX/env/lib/python3.9/site-packages/medaka/medaka.py", line 769, in main
    args.func(args)
  File "XXX/env/lib/python3.9/site-packages/medaka/stitch.py", line 265, in stitch
    contigs, gt = fill_gaps(contigs, args.draft, args.fill_char)
  File "XXX/env/lib/python3.9/site-packages/medaka/stitch.py", line 127, in fill_gaps
    for info, sequence_parts, qualities in contigs:
  File "XXX/env/lib/python3.9/site-packages/medaka/stitch.py", line 175, in collapse_neighbours
    contig = next(contigs)
  File "XXX/env/lib/python3.9/site-packages/medaka/stitch.py", line 246, in stitch_regions_serial
    label_scheme = index.metadata['label_scheme']
KeyError: 'label_scheme'

Environment (if you do not have a GPU, write No GPU):

cjw85 commented 7 months ago

Hi @jd3234,

It looks like your consensus_probs.hdf file is missing some of the required meta information that should have been written by medaka consensus, I'm afraid I can't really suggest how that might have happened. Do you have the logs of the medaka consensus that was run previous to medaka stitch?

jd3234 commented 7 months ago

Hi @cjw85 ,

thanks for your answer! While waiting for your answer I had already restartet medaka consensus as I also suspected it might be the problem. I can confirm that your assessment is correct - consensus_probs.hdf is now considerably larger and medaka stitch ran without any problems on the new file.

I think we might have had a network or job scheduler glitch that corrupted the file somehow although medaka and the job scheduler reported that the job finished successfully.

Thanks again! I am closing the issue.