samtools / hts-specs

Specifications of SAM/BAM and related high-throughput sequencing file formats
http://samtools.github.io/hts-specs/
632 stars 174 forks source link

primary, secondary, and supplementary alignments with optional MM tags #739

Closed RichardCorbett closed 9 months ago

RichardCorbett commented 11 months ago

Hi folks,

Apologies if this is in the spec, but I can't find and answer on there.

I'm looking at some bams that have primary, secondary and supplementary alignments of long reads. Is there any requirement in the spec for having the optional tags that describe the read (ie. MM tag, for example) present in all the records associated with it? In other words, if we have 5mC information in the MM tag for a read, should the MM tag be repeated for all alignments, or should it be reported once, for the primary alignment?

thanks Richard

jkbonfield commented 11 months ago

There aren't any real requirements for tags on any type of data.

However you do need to consider what is useful and whether there is a need for it.

If you have a non-secondary supplementary alignment, then perhaps it can be relevant to have the MM tag duplicated as that supplementary alignment is part of the alignment (I dislike how "primary" is normally used only for non-supplementary, as with a split read the entire sequence as a whole has been aligned but it just happens to e.g. span a large insert (or is an mrna alignment) and both halves are, in my brain at least, part of the primary alignment, but that argument is sadly lost). I can totally imagine MM being used here.

For secondary alignments, there are lots of varying ways people do this already. Some do hard clipping, and some don't even record SEQ/QUAL at all and just have "*". So what's relevant with MM will be affected by what's happening with SEQ too.

RichardCorbett commented 11 months ago

Thanks @jkbonfield,

I asked because some folks here at our centre are looking at nanopore alignments made by guppy to interrogate methylation information in IGV. The primary alignments get the methylation highlights, but the secondary/supplementary reads reportedly do not.

Might you have some magic code around to "spread" the tags from primary to other alignments?

jkbonfield commented 11 months ago

Sorry no. I'm not involved in the creation of data. You'll need to take this up with whoever writes the software you are using to align the data, or to roll your own tool for migrating data around.

RichardCorbett commented 11 months ago

Thanks @jkbonfield. Have a great day.

jkbonfield commented 9 months ago

Closing this as it's not really a specification issue and more of a tooling one.