Closed Shians closed 3 months ago
Strictly speaking you are correct, but the majority of extant files with no modifier are in fact intended to be "?". The modifier has been part of the spec for some time now, can I ask what tool or pipeline is still producing files without this specified?
Also, do you know from details of the experiment that this file should in fact be interpreted as "."? It would be unusual.
Thanks for the very fast reply. This particular experiment definitely intended for the missing flag to indicate "?", however as I am a maintainer of a tool that needs to parse such data, I am reluctant to code against spec. I see you hit the same conundrum a few years ago https://github.com/samtools/hts-specs/issues/654, and it doesn't seem like there's a satisfactory resolution.
I don't have any examples of real data where it should be treated as ".", and I hope to never see BAM files without the flag again. I will follow the precedence of IGV as it sounds like that'll most likely produce the result users expect.
It's not an idea situation, but I think assuming "?" is safer than assuming ".". By assuming "?" you are not making any assumptions about the presence or not of the modification. If you assume "." you are stating, in effect, that the modification is known to be absent. Since we know that many if not most files produced before this modifier was introduced did not intend to make statements about modifications not recorded I think we have to go with the "don't know" option. No current tools should be producing files without these modifiers.
As of IGV 2.17.4 03/23/2024 (apologies if old version, I don't have installation rights on this machine), the parsing for BAM modification tags seems to be incorrect for MM tags where there is no "." or "?". My interpretation of the spec is that when the modifier is missing, it is to be interpreted as ".", but the behavior seems to be that it is interpreted as "?".
Below is the screenshot of a subset of data from https://github.com/human-pangenomics/HPP_Year1_Assemblies which does not have a modifier (middle track), I've manually added "." to the BAM files (top track) and "?" (bottom track). The expectation is for the middle track to be identical to the top track but it is instead identical to the bottom track.
The 3 bam files are attached and the data is found in the region chrX:72283997-72286054.
bam_files.zip