hasindu2008 / slow5tools

Slow5tools is a toolkit for converting (FAST5 <-> SLOW5), compressing, viewing, indexing and manipulating data in SLOW5 format.
https://hasindu2008.github.io/slow5tools
MIT License
90 stars 6 forks source link

Bad fast5 attribute pore type #85

Closed mbhall88 closed 1 year ago

mbhall88 commented 1 year ago

I got this error when using f2s in v0.6.0.

Bad fast5: Attribute read_008c8f1f-f32f-4b23-ab6f-1ef4f4531a20/pore_type in /nfs/research/zi/mbhall/tech_wars/data/madagascar/nanopore/raw_data/md_tb_reseq_2019_5/multi_fast5s/BC76_r0b1235_0.fast5 is duplicated and has two different values in two different places. Please report this with an example FAST5 file at 'https://github.com/hasindu2008/slow5tools/issues' for us to investigate.

Here is the fast5 that caused it. BC76_r0b1235_0.fast5.gz

hasindu2008 commented 1 year ago

Hi @mbhall88 Is this a very recently generated FAST5 file? @hiruna72 can you have a look at this attached file please?

hiruna72 commented 1 year ago

Hello @mbhall88,

Thank you very much for reporting this issue. The file you sent does not have an expected attribute (file_type) which is usually found at the root group. The file was misidentified as a single-fast5 file and that caused f2s to error out. The error message was saying something completely different. However, that helped us to discover another bug related to pore_type.

Screenshot from 2022-09-26 16-12-07

Could you please share any information on how this file was generated?

mbhall88 commented 1 year ago

This sequencing run was done a few years ago - ~2019 I think. There's not really anything else I can think of that is out of the ordinary.... I thought that maybe I merged a bunch of single fast5s, but the file name makes me think not (I normally use the default prefix in the ont fast5 api singoe 2 multi tool, which renames them batch_0 etc.)

What other kind of info do you need?

hasindu2008 commented 1 year ago

It seems like this is one of those files between when ONT did the transition from single to multi-fast5. This file structure is something in between real multi-fast5 and single to multi-converted fast5. Do you have many datasets like this or is it only a few? We have to think about how we can go ahead in doing this in f2s without affecting the types of fast5 files it work with. Till then could you please use this script? It is a script that I wrote to handle multi-fast5 generated using the ONT's single_to_multi_fast5 when multiple run IDs are mixed up in the same file. I ran this script on your file and it worked. Make sure you have multi_to_single_fast5 that comes with ONT fast5 API, h5dump (apt-get install hdf5-tools), parallel (apt-get install parallel) and slow5tools in PATH.

mbhall88 commented 1 year ago

I've got about 5/6 runs I think. I wouldn't worry about trying to support files like this if they're a weird transition/corner case. Gotta maintain your sanity!

The link for that script takes me to a 404 not found.

hasindu2008 commented 1 year ago

Ohh there was a typo on this link and sorry I did not see the last 404 part until @hiruna72 told me, the link is https://github.com/hasindu2008/slow5tools/blob/master/scripts/mixed-multi-fast5-to-blow5.sh.

hasindu2008 commented 1 year ago

@mbhall88 Did this script work?

mbhall88 commented 1 year ago

I think so. Sorry, I haven't forgotten about this. I am running it on a collection of nanopore runs and it's taking a few days. Hopefully the jobs will finish over the weekend. I'll close this issue when/if they finish successfully if that's okay?

hasindu2008 commented 1 year ago

Sure. No prob.

mbhall88 commented 1 year ago

That script seems to have done the trick! Thanks a lot @hasindu2008