nanoporetech / ont_fast5_api

Oxford Nanopore Technologies fast5 API software
Other
144 stars 28 forks source link

Decompress/repack from VBZ to GZIP or something option compatible with squiggle tools #34

Closed callumparr closed 4 years ago

callumparr commented 4 years ago

I have seen on the VBZ compression repository containing the not hdf5 plugin for working with VBZ compression but is there any plan to make a subtotal within ont_fast5_api?

I am little apprehensive changing the -f parameter with h5repack and what it does exactly. It would also be good to have some system of naming output files matching the input for list of fast5 files.

fbrennen commented 4 years ago

Hi @callumparr -- are you looking for the compress_fast5 script?

https://github.com/nanoporetech/ont_fast5_api#compress_fast5

callumparr commented 4 years ago

I am looking for something like this but going from VBZ to GZIP so I can use on fast5 out files containing squiggles generated with master of pores pipeline that are then compatible with tailfindr and nanopolish polya

find . -name "*.fast5" | xargs -P 10 -I % h5repack -f UD=32020,5,0,0,2,1,1 % %.vbz

Or even overwriting the in files to simplify things

find . -name "*.fast5" | xargs -P 10 -I % sh -c "h5repack -f UD=32020,5,0,0,2,1,1 % %.vbz && mv %.vbz %"

Can I just simply replace -f filter with GZIP=1 and .gzip ?

fbrennen commented 4 years ago

compress_fast goes both ways -- you can convert the raw dataset from gzip to vbz, and also from vbz to gzip. Is that good enough?

callumparr commented 4 years ago

Yeh sorry I think this is more to do my basic understanding of informatics.

fbrennen commented 4 years ago

Great, so you have everything you need then?

callumparr commented 4 years ago

Sorry forgot to close