hasindu2008 / slow5tools

Slow5tools is a toolkit for converting (FAST5 <-> SLOW5), compressing, viewing, indexing and manipulating data in SLOW5 format.
https://hasindu2008.github.io/slow5tools
MIT License
90 stars 6 forks source link

Help with a tricky FAST5 #103

Closed hengjwj closed 7 months ago

hengjwj commented 7 months ago

Hi @hasindu2008,

I'm having difficulty converting one of my FAST5 files (number 20 of 182) to BLOW5:

[s190075@hpc-amd004 workspace]$ ~/bin/slow5tools-v1.1.0/slow5tools f2s -p 16 FAN33287_8224016851906804b27023975e7e67f55f73adea_19.fast5 -d b5
[f2s_main] 1 fast5 files found - took 0.001s
[f2s_main] Just before forking, peak RAM = 0.000 GB
[f2s_iop] 1 proceses will be used.
[read_fast5::ERROR] Bad fast5: Could not iterate over the read groups in the fast5 file FAN33287_8224016851906804b27023975e7e67f55f73adea_19.fast5.
[f2s_child_worker::ERROR] Bad fast5: Could not read contents of the fast5 file 'FAN33287_8224016851906804b27023975e7e67f55f73adea_19.fast5'.

I tried to convert it to single read to remove problematic reads but ONT's multi_to_single_fast5 failed. Strangely, however, Guppy managed to basecall the file without incident, although there were only 73 reads in the FASTQ generated (expected 4,000 reads). I used Guppy's --fast5_out to try to regenerate an intact fast5 but I got the same errors (actually, the above was from the re-generated fast5)

Do you have other ideas on how to salvage it? I've uploaded the fast5 from Guppy if you want to try: https://entuedu-my.sharepoint.com/:f:/g/personal/s190075_e_ntu_edu_sg/EumhbBBCZ3JBsMjK_4TiU6IB58mzsZG81Z9t6LV6HKrIzA?e=NfB79x

Joel

Psy-Fer commented 7 months ago

Hey,

Did you get an error from multi_to_single_fast5 that you could share?

We will have a look. James

hasindu2008 commented 7 months ago

I tried to run hdf5 utilities on that file and that failed too.

 h5dump in/FAN33287_8224016851906804b27023975e7e67f55f73adea_19.fast5
h5dump error: internal error (file ../../../../tools/h5dump/h5dump.c:line 1485)

This seems like a highly corrupted file. Surprisingly Guppy runs as you mentioned and just basecalls like 70 reads, but it is likely that it just continues while ignoring any errors. Thus it would be hard to trust any basecalls coming out from it either.

Are you facing this on many files or just this one?

Psy-Fer commented 7 months ago

I tried h5py on that file and got this

Python 3.10.6 (main, Mar 10 2023, 10:55:28) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import h5py
>>> f = h5py.File("FAN33287_8224016851906804b27023975e7e67f55f73adea_19.fast5", 'r')
>>> f
<HDF5 file "FAN33287_8224016851906804b27023975e7e67f55f73adea_19.fast5" (mode r)>
>>> f.keys()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jamfer/.local/lib/python3.10/site-packages/h5py/_hl/base.py", line 386, in __str__
    return "<KeysViewHDF5 {}>".format(list(self))
  File "/usr/lib/python3.10/_collections_abc.py", line 881, in __iter__
    yield from self._mapping
  File "/home/jamfer/.local/lib/python3.10/site-packages/h5py/_hl/group.py", line 471, in __iter__
    for x in self.id.__iter__():
  File "h5py/h5g.pyx", line 128, in h5py.h5g.GroupIter.__next__
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5l.pyx", line 316, in h5py.h5l.LinkProxy.iterate
RuntimeError: Link iteration failed (incorrect metadata checksum after all read attempts)

Yea something is very wrong with that file

hengjwj commented 7 months ago

Hey,

Did you get an error from multi_to_single_fast5 that you could share?

We will have a look. James

Sorry for the late reply. Here's what I got from multi_to_single_fast5:

[s190075@hpc-amd004 workspace]$ multi_to_single_fast5 -i FAN33287_8224016851906804b27023975e7e67f55f73adea_19.fast5 -s singleread -t 16
ERROR:ont_fast5_api.conversion_tools.multi_to_single_fast5:Link iteration failed (incorrect metadatda checksum after all read attempts)   |  0% ETA:  --:--:--
        Failed to copy files from: FAN33287_8224016851906804b27023975e7e67f55f73adea_19.fast5
| 1 of 1|##################################################################################################################################|100% Time: 0:00:00
hengjwj commented 7 months ago

I tried to run hdf5 utilities on that file and that failed too.

 h5dump in/FAN33287_8224016851906804b27023975e7e67f55f73adea_19.fast5
h5dump error: internal error (file ../../../../tools/h5dump/h5dump.c:line 1485)

This seems like a highly corrupted file. Surprisingly Guppy runs as you mentioned and just basecalls like 70 reads, but it is likely that it just continues while ignoring any errors. Thus it would be hard to trust any basecalls coming out from it either.

Are you facing this on many files or just this one?

Just this one file. Ok, I think I'll just drop it then.

Thanks @Psy-Fer and @hasindu2008!

hasindu2008 commented 7 months ago

Closing this issue. Feel free to reopen or start a new issue. Glad to help.