hasindu2008 / slow5tools

Slow5tools is a toolkit for converting (FAST5 <-> SLOW5), compressing, viewing, indexing and manipulating data in SLOW5 format.
https://hasindu2008.github.io/slow5tools
MIT License
90 stars 6 forks source link

Add 'header' only output as option in 'View' function #114

Closed alegione closed 1 month ago

alegione commented 1 month ago

Thanks for making and maintaining this great tool! This is a feature request rather than an issue.

Samtools view has the option to only print the header information (samtools view -H), I'm wondering if there is a means of adding a similar flag to output header info from a slow/blow5 file?

The use case of this is that I have some old sequencing data and I want to get all the data on how they were generated as it wasn't well recorded/accessible in the data I had. I imagine this comes up quite in labs.

I can use the following currently:

slow5tools view my_old_file.blow5 | head -n 44

Which pops out all the info I need (flow cell ID, flow cell code, sequencing date, run id, kit id, etc).

With a bit of thought I can probably hack together some bash commands to pull out all values between the first ## and last ## in the header, and with a bit of googling work out how to put that into a tab delimited format or similar, but I lack the talent or know-how to do this within the confines of the tool and make a pull request, so thought I'd raise it here in case incorporating the feature into the tool itself was straight forward (I bow to you and your expertise).

Thanks for all your work, it's so nice to be able to compress some of this old massive data (which gets the lovely flag 'weird or ancient fast5...' 😄 into something a bit more manageable.

PS. apologies if this can be achieved with another function that I've missed!

hasindu2008 commented 1 month ago

Glad you found it useful :D Well, yeh, millions of different fast5 types out there, good to get regularised.

I think the function you want is already available from slow5tools v0.7. The following command would give the header.

slow5tools skim --hdr file.blow5 

Before slow5tools v0.7, required a bash command:

slow5tools view file.blow5 | grep '^[#@]'

Some useful oneliners are in case you find it useful: https://hasindu2008.github.io/slow5tools/oneliners.html

alegione commented 1 month ago

Haha, literally the first option/example on the page!

Thanks mate, appreciate the kind and quick response.