rmcclosk / morin-rotation

Rotation with Morin lab.
2 stars 1 forks source link

Corruption of BAM files #2

Closed rmcclosk closed 10 years ago

rmcclosk commented 10 years ago

A number of the BAM files for the QCROC colorectal exome dataset have a malformed header which is preventing me from running GATK on them, even with either "--validation-strictness LENIENT" or "--validation-strictness SILENT". The problem is the @PG tag in the header, which reads @PG ID:0 VN: CL:bwa aln; bwa sampe, instead of the correct header found in the remaining BAM files, @PG ID:0 VN:0.5.7 CL:bwa aln; bwa sampe. The VN (version number) field in the header is missing in the malformed files.

The affected files are

I have informed Ryan of this and he said he would email the data people to try and resolve the issue, so I don't have to duplicate the BAMs. The way to fix this would be

samtools view -H [bamfile] | sed s/VN:/VN:0.5.7/ > fixed.header.sam
samtools reheader fixed.header.sam [bamfile] 
rmcclosk commented 10 years ago

Ryan has told me to fix these myself instead of waiting for a response from the data's curators.