single_to_multi_fast5 compression is not working ?

nanoporetech / ont_fast5_api

Oxford Nanopore Technologies fast5 API software

Other

144 stars 28 forks source link

single_to_multi_fast5 compression is not working ? #38

Closed svennd closed 4 years ago

svennd commented 4 years ago

In the latest version I can't seem to compress & go from single to multiple;

single_to_multi_fast5  --version
3.1.3

using :

 single_to_multi_fast5 -i input -s output.mod -t 16 --recursive -c vbz

It does not give an error, but going from single -> multiple -> compressed multiple gives compression;

sizes :

187M    ori (original)
134M    ori.uncompressed (single -> multi)
134M    ori.modC (single -> multi + vbz)
93M     ori.compress (multi -> vbz)

Am I doing something wrong ? Obv, i can do it in two steps, but since you where so nice to make it in 1 command possible, it would be nice.

fbrennen commented 4 years ago

Hi @svennd -- it should work just like you describe, with compression happening as part of single_to_multi. We'll have a look. Just to confirm, is this what you did?

# Uncompressed, 134M result
single_to_multi_fast5 -i original -s uncompressed -t 16 --recursive
# ModC result
single_to_multi_fast5 -i original -s modC -t 16 --recursive -c vbz
# Compress result
compress_fast5 -i modC -s compress -c vbz

svennd commented 4 years ago

yes, exactly; it might be that this dataset is to old (oktober 2017)

I would expect the ori.compress == ori.modC; as both should be in multi fast5 format both VBZ compressed;

I checked using : h5dump -pH batch_0.fast5 | grep DEFLATE and it turns out those files are not VBZ compressed but just gzip. (as expected by the file size)

fbrennen commented 4 years ago

HI @svennd -- it does indeed look like we're not currently compressing data from single-read files when they're merged by single_to_multi. We'll get this fixed.

fbrennen commented 4 years ago

Hi @svennd -- should be all fixed now (in version 3.1.4). Have a try!

svennd commented 4 years ago

I tried, it works now, thanks for the help !

Op vr 12 jun. 2020 om 18:17 schreef Forrest Brennen < notifications@github.com>:

Hi @svennd https://github.com/svennd -- should be all fixed now (in version 3.1.4). Have a try!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nanoporetech/ont_fast5_api/issues/38#issuecomment-643359639, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABWHBMZ3NIRWGHEH5XKXQGDRWJILHANCNFSM4N2LWCGQ .