samtools / htsjdk

A Java API for high-throughput sequencing data (HTS) formats.
http://samtools.github.io/htsjdk/
284 stars 243 forks source link

The SamHeader system is vulnerable to tag "injection" when writing to SAM #1108

Open yfarjoun opened 6 years ago

yfarjoun commented 6 years ago

One can set a header attribute (like DS) to be "text with INJECTION\tPI:123" and then the output (Sam or Bam) file that is written will have a field PI:123.

d-cameron commented 6 years ago

It's not just the header that is vulnerable to having illegal characters injected. No checks are done on the record fields either. Since htsjdk is not VCFv4.3 aware, it does no encoding of illegal character in info or format field (which is fine) but it also does not prevent those characters being written so it just outputs non-compliant records.

d-cameron commented 6 years ago

Sorry, I'm mixing up my specs documents here. The issue applies to SAM record tags as well. TextTagCodec.java:82 just does sb.append(tagType).append(':').append(value.toString()); when emitting string fields.

No validation is done in VCF either (see #1063 for a example of this being a problem with GATK output) but presumably that's a different issue.