A number of the BAM files for the QCROC colorectal exome dataset have a malformed header which is preventing me from running GATK on them, even with either "--validation-strictness LENIENT" or "--validation-strictness SILENT". The problem is the @PG tag in the header, which reads @PG ID:0 VN: CL:bwa aln; bwa sampe, instead of the correct header found in the remaining BAM files, @PG ID:0 VN:0.5.7 CL:bwa aln; bwa sampe. The VN (version number) field in the header is missing in the malformed files.
I have informed Ryan of this and he said he would email the data people to try and resolve the issue, so I don't have to duplicate the BAMs. The way to fix this would be
A number of the BAM files for the QCROC colorectal exome dataset have a malformed header which is preventing me from running GATK on them, even with either "--validation-strictness LENIENT" or "--validation-strictness SILENT". The problem is the @PG tag in the header, which reads
@PG ID:0 VN: CL:bwa aln; bwa sampe
, instead of the correct header found in the remaining BAM files,@PG ID:0 VN:0.5.7 CL:bwa aln; bwa sampe
. The VN (version number) field in the header is missing in the malformed files.The affected files are
I have informed Ryan of this and he said he would email the data people to try and resolve the issue, so I don't have to duplicate the BAMs. The way to fix this would be