Closed mattloose closed 3 years ago
Hi @mattloose -- unfortunately, you can't just "delete" data from an hdf5 file (it remains in the file and filesize doesn't decrease). In order to accomplish this you'd need to write new files and then copy them back over the original ones. The HDF group supplies standalone programs to do this, such as h5repack
, which writes out a new, "repacked" file, and replaces the original one with it. Older tools like picopore
required a separate installation of those programs in order to accomplish an in-place filesize reduction, but we decided not to do that with ont_fast5_api
.
It would be entirely possible for us to write new files to a temporary directory, and then copy them back over the original ones when sanitization is complete, thus accomplishing the end "in-place" goal. Is that something you would find useful?
Hi @fbrennen - thanks for getting back to me - yes I realised later that that is a fundamental issue with hdf5 - it's easy enough to do this manually. I'd just forgotten that restriction.
Thanks.
It would be nice to be able to sanitize files with the --in_place option. Is their a reason why this can't be done?