Closed DrownedMala closed 2 years ago
Hello @DrownedMala
Thank you for reporting this. I suspect that the original fast5 had ONT's latest compression called vbz (that uses Z standard and StreamVByte with zig-zag delta techniques). s2f
creates default gzip compressed fast5 files which can be larger.
Could you please,
ls DIR | wc
h5stat
to check the compression method of fast5 file(s).$ h5stat a_vbz_compressed.fast5 | grep filter
Dataset filters information:
NO filter: 0
GZIP filter: 0
SHUFFLE filter: 0
FLETCHER32 filter: 0
SZIP filter: 0
NBIT filter: 0
SCALEOFFSET filter: 0
USER-DEFINED filter: 100
$ h5stat a_gzip_compressed.fast5 | grep filter
Dataset filters information:
NO filter: 0
GZIP filter: 100
SHUFFLE filter: 0
FLETCHER32 filter: 0
SZIP filter: 0
NBIT filter: 0
SCALEOFFSET filter: 0
USER-DEFINED filter: 0
Thank you Regards, Hiruna
Yes, thank you for your reply! So, I checked and the number of file is the same in both directories. While, for the compression method, here it is what I get:
h5stat fast5_1st/FAL46657_19f232d7_0.fast5 | grep filter Dataset filters information: NO filter: 0 GZIP filter: 0 SHUFFLE filter: 0 FLETCHER32 filter: 0 SZIP filter: 0 NBIT filter: 0 SCALEOFFSET filter: 0 USER-DEFINED filter: 4000
h5stat fast5_again/FAL46657_19f232d7_0.fast5 | grep filter Dataset filters information: NO filter: 0 GZIP filter: 4000 SHUFFLE filter: 0 FLETCHER32 filter: 0 SZIP filter: 0 NBIT filter: 0 SCALEOFFSET filter: 0 USER-DEFINED filter: 0
Ah yes. Your original data is in 'vbz' compressed format. However, s2f as the moment write files in zlib compressed format. That is why the file size gets bigger.
In theory, you can convert any zlib fast5 to vbz fast5 using ONT's compression program. But I highly discourage this as their compression program is buggy and damages fields in their own format (see #59).
Perhaps in future, we could give an option in slow5tools s2f to directly generate FAST5 in vbz. But at the moment this is not a priority because once we convert to SLOW5 for archival purposes the only need for converting back to FAST5 is when rebasecalling using Guppy (as Guppy is not opensource and thus we can't contribute to SLOW5 support on it) and for this converted FAST5 the compression format does not matter much as it is temporary.
I see, thanks for the support!
Good work, keep it up! Cheers, Simone
Hello there, I was trying the conversion f2s and it all worked out pretty well, but once I tried s2f it generated a folder bigger than the original one:
du -sh fast5_1st blow5_1st fast5_again
with output: 4.5G fast5_1st 2.8G blow5_1st 6.0G fast5_againCommands I used:
slow5tools f2s fast5_1st/ -d blow5_1st/
slow5tools s2f blow5_1st/ -d fast5_again/
I don't know if it's an issue or if it has something to do with compression mechanisms I am just not aware of, it just felt right to report back! Thanks for any reply, have a good day and keep up the good work! Simone