samtools / htsjdk

A Java API for high-throughput sequencing data (HTS) formats.
http://samtools.github.io/htsjdk/
283 stars 242 forks source link

Validate that SAM header tag keys are exactly 2 characters long #1561

Closed jmarshall closed 3 years ago

jmarshall commented 3 years ago

Description

As reported in #1477, the SAM specification (§1.3) requires that the keys of header tagged fields match /^[A-Za-z][A-Za-z0-9]$/, but HTSJDK even accepts keys that are not two characters in length.

This PR adds header tag key checking to SAMTextHeaderCodec's validation. It adds a new HEADER_TAG_INVALID_KEY entry to SAMValidationError, described as representing keys invalid due to either containing invalid characters or being the wrong length. This PR implements a length check but does not implement an alphanumeric character check.

You may prefer to either remove the future intention from the HEADER_TAG_INVALID_KEY documentation comment or to implement a character validity check (and add further test cases).

Things to think about before submitting:

nh13 commented 3 years ago

@cmnbroad any objection from your side?