samtools / hts-specs

Specifications of SAM/BAM and related high-throughput sequencing file formats
http://samtools.github.io/hts-specs/
647 stars 174 forks source link

Tag type typos in CRAM "Encoding tags" section #112

Closed jmarshall closed 8 years ago

jmarshall commented 8 years ago

In the v3 spec (and similarly in the v2.1 spec), §8.5 (Slice header block) says that tag types are the same as BAM ([AfZHcCsSiIB]), but the example in §8.4 (Encoding tags) has

For example AMiOQz\0OQz\0, where the TD consists of just two values: integer 0 for tags {AM:i,OQ:z} and 1 for tag {OQ:z}.

However the string tag type is uppercase Z, and OQ is OQ:Z in BAM. Hopefully this is just a typo in the spec text, and tags appear as uppercase Z in actual CRAM files…?

jkbonfield commented 8 years ago

Yes they're uppercase. An example from cram_dump:

      Tag encoding map:
        SMc =>   BYTE_ARRAY_LEN {3, 4, 1, 1, 1, 0, 1, 1, 53}
        QTZ =>  BYTE_ARRAY_STOP {9, 56}
        BCZ =>  BYTE_ARRAY_STOP {9, 50}
        XAZ =>  BYTE_ARRAY_STOP {9, 60}
        ahc =>   BYTE_ARRAY_LEN {3, 4, 1, 1, 1, 0, 1, 1, 64}
        XCc =>   BYTE_ARRAY_LEN {3, 4, 1, 1, 1, 0, 1, 1, 61}
        XGc =>   BYTE_ARRAY_LEN {3, 4, 1, 1, 1, 0, 1, 1, 51}
        AMc =>   BYTE_ARRAY_LEN {3, 4, 1, 1, 1, 0, 1, 1, 52}
        XMc =>   BYTE_ARRAY_LEN {3, 4, 1, 1, 1, 0, 1, 1, 54}
        a3c =>   BYTE_ARRAY_LEN {3, 4, 1, 1, 1, 0, 1, 1, 49}
        XOc =>   BYTE_ARRAY_LEN {3, 4, 1, 1, 1, 0, 1, 1, 55}
        X0s =>   BYTE_ARRAY_LEN {3, 4, 1, 2, 1, 0, 1, 1, 58}
        X0c =>   BYTE_ARRAY_LEN {3, 4, 1, 1, 1, 0, 1, 1, 47}
        X0C =>   BYTE_ARRAY_LEN {3, 4, 1, 1, 1, 0, 1, 1, 63}
        X1c =>   BYTE_ARRAY_LEN {3, 4, 1, 1, 1, 0, 1, 1, 59}
        X1C =>   BYTE_ARRAY_LEN {3, 4, 1, 1, 1, 0, 1, 1, 62}
        X1s =>   BYTE_ARRAY_LEN {3, 4, 1, 2, 1, 0, 1, 1, 48}
        XTA =>   BYTE_ARRAY_LEN {3, 4, 1, 1, 1, 0, 1, 1, 57}

Thanks for spotting the typo. I'll fix it.

jkbonfield commented 8 years ago

PS. Yes, there is some oddity there of X1c and X1C when they can probably be forcibly merged. Cramtools I think does a better job of rationalising these where appropriate (perhaps due to some common sanitizer in htsjdk). It's a side issue though.