Open kbradnam opened 10 years ago
I agree the SAM/BAM specification isn't novice-friendly, but maybe it doesn't need to be? It should be a developer centric dry technical document, but supplemented by separate user-facing documentation provided from people using SAM/BAM.
That would be acceptable, but you would still ideally want to direct people to those more user-friendly sources of documentation from the main SAM documentation.
Most people will come across the current SAM documentation from a Google search for 'SAM format'.
There is space on SEQwiki for user created format information. On 25 Nov 2014 21:01, "Keith Bradnam" notifications@github.com wrote:
That would be acceptable, but you would still ideally want to direct people to those more user-friendly sources of documentation from the main SAM documentation.
Most people will come across the current SAM documentation from a Google search for 'SAM format'.
— Reply to this email directly or view it on GitHub https://github.com/samtools/hts-specs/issues/55#issuecomment-64470004.
FWIW, I've always found http://broadinstitute.github.io/picard/explain-flags.html a very handy calculator for SAM flags.
The incorporation of these binary flags in an otherwise "readable" format let's me mischievously suspect that they were intended as an obstacle in the first place.
Bit flag is a succinct way to encapsulate rich information. At the time of the first draft, it was not obvious how to represent multiple info in a readable style without greatly complicating the format. In the lack of an acceptable alternative, we kept the bit flag.
A few years later, I realized that we could use one character for each bit. This was the old samtools view -X
output. In this representation, 99=0x63 becomes pP1R
. It is more readable while maintaining a simple 1-to-1 translation to the bit flag. Nonetheless, the proposal was rejected by the consensus. Most considered this change too late as SAM was fairly mature.
Thanks Heng for the comment ! I just found today the, for me, so far best explanation of these flags and some tips on how to deal with them in python and perl scripts here: http://blog.nextgenetics.net/?e=18 I spend a long time searching for this info. For a simple minded biologist a simple letter code has its advantages.
If you are new to bioinformatics, and are asked to work with any SAM file, then you might reasonably turn to this documentation to help to better understand how the format works.
I feel that many people have trouble understanding what is meant by bitwise FLAG values. The documentation is very technical and not very transparent to people who may be new to bioinformatics.
Many people might be turning to the documentation after looking at their SAM output file. Maybe they see that their output file has a range of integer values in column 2 and are puzzled by the explanation in the documentation (this is very likely if you have no familiarity with bit patterns).
I think this section would be greatly helped by the following: