nanoporetech / ont_fast5_api

Oxford Nanopore Technologies fast5 API software
Other
144 stars 28 forks source link

--sanitize can't be used with --in_place with compress_fast5 #58

Closed mattloose closed 3 years ago

mattloose commented 3 years ago

It would be nice to be able to sanitize files with the --in_place option. Is their a reason why this can't be done?

fbrennen commented 3 years ago

Hi @mattloose -- unfortunately, you can't just "delete" data from an hdf5 file (it remains in the file and filesize doesn't decrease). In order to accomplish this you'd need to write new files and then copy them back over the original ones. The HDF group supplies standalone programs to do this, such as h5repack, which writes out a new, "repacked" file, and replaces the original one with it. Older tools like picopore required a separate installation of those programs in order to accomplish an in-place filesize reduction, but we decided not to do that with ont_fast5_api.

It would be entirely possible for us to write new files to a temporary directory, and then copy them back over the original ones when sanitization is complete, thus accomplishing the end "in-place" goal. Is that something you would find useful?

mattloose commented 3 years ago

Hi @fbrennen - thanks for getting back to me - yes I realised later that that is a fundamental issue with hdf5 - it's easy enough to do this manually. I'd just forgotten that restriction.

Thanks.