Open cmdcolin opened 1 year ago
I like this idea, but sadly currently it doesn't exist.
It'd need to be in the @CO
tag to avoid breaking existing parsers that validate the headers, at least until that mythical time we develop SAM 2.0. That's not ideal, but we are where we are.
I guess we could carve out a namespace within CO for additional commentary. Eg:
@CO @TAG ID:X0 TY:i DS:Number of best hits
You're perfectly at liberty to start doing this already, although it'd obviously need buy-in from the genome browsers. I'm not sure we'd want to add something formal to the specification unless we see active buy-in from multiple implementations.
I was wondering if there was a way or specification for SAM headers to describe what custom tags they are using, for example the lower case and X/Y/Z prefixed tags. My angle on this is just showing users at a glance what various fields mean in a genome browser, but can imagine it being useful in other circumstances.
VCF kind of has this with e.g. "1.4.4 Individual format field format" which will allow a file to self-describe the custom fields in it's FORMAT column
It could possibly make it easier to at-a-glace for a human to understand a data file. possible caveats