nanoporetech / ont_fast5_api

Oxford Nanopore Technologies fast5 API software
Other
144 stars 28 forks source link

The labels in the "end_reason" data type get lost when using compress_fast5 #70

Closed hasindu2008 closed 2 years ago

hasindu2008 commented 2 years ago

Hi,

The compress_fast5 tool in ont_fast5_api seems to somehow convert this end_reason from enum data type to _uint8t. We believe this is lossy, as once the enum labels are gone, there is no way to identify which number relates to what label. Is this intended or a potential bug? Is there any solution to recover these lost labels once the fas5_compress is used?

Original fast5: image

Command used: compress_fast5 -i a/ -s b/ -c vbz

Converted fast5: image

compress_fast5 --version 4.0.0

Regards -Hasindu

fbrennen commented 2 years ago

Hi @hasindu2008 -- thanks for letting us know. If the enum labels are being lost then it's definitely a bug. We'll have a look.

fbrennen commented 2 years ago

Hi @hasindu2008 -- we think we've fixed this in ont-fast5-api 4.0.2. Can you try that out?

hasindu2008 commented 2 years ago

Yes, it seems to retain the enum labels now. Credits should go to @mattloose who found this issue while attempting to convert such fast5 to slow5.

He might be interested in a method that potentially recovers the lost enum labels in his fast5 files converted using compress_fast5.

fbrennen commented 2 years ago

Great to hear @hasindu2008! We investigated recovering the labels, but unfortunately the way enums are stored has the potential to vary depending on the MinKNOW version, and we didn't feel we could guarantee that it was always done correctly. If you or @mattloose has a lot of reads that you've already processed then we could likely get you a one-off enum recovery tool.

hasindu2008 commented 2 years ago

Luckily we did not use this tool on our datasets, instead directly saved what MinKNOW produced. This issue was discovered by @mattloose while trying to covert some of his data into slow5. Perhaps he would be interested in such a tool.

Also @fbrennen, we would appreciate some thoughts from you on how we should handle this situation in slow5tools. We have currently decided to save the corrupted attribute as it is with a WARNING even though it is not a perfect solution. Alternatively, we can error out in such datasets and refer to this thread so that anyone affected can seek your help in recovering their fields. Any suggestions on alternative methods are also welcome.

fbrennen commented 2 years ago

Given it only just happened and it's relatively early in the life of slow5, I would think the best thing would be to emit a message that recommends that customers re-convert their data when you find the particular situation, and wait for the problem to go away on its own. I would also pin the minimum version of ont-fast5-api required to the one we just released.

Kevinzjy commented 1 year ago

Hi @fbrennen .

We compressed some samples using the "corrupted" version of ont-fast5-api, and are suffering from the "end_reason" issue recently.

Is there any possibility that you can provide a script for converting the integer back to enum? It seems that MinKNOW version is recorder in the fast5 files, so theoretically it should be possible to convert it back according to the MinKNOW version.

image

Thanks.

Jinyang