Open srl147 opened 7 years ago
the links didn't insert correctly ftp://ngs.sanger.ac.uk/scratch/project/team117/srl/input2.bam ftp://ngs.sanger.ac.uk/scratch/project/team117/srl/fixed2.bam
So, by the SAM spec, it is kind of doing the right thing. A single '*' in the quality value stands for a lack of quality data. ClipAdaptor treats sequence and quality as text, so after clipping the single '*' gets interpreted as a lack of quality data and gets stored as a 255 in BAM format.
When ClipReinsert pulls the data out it reads the 255, adds 33 to get the right ascii value and so (because it is effectively a char type) becomes a space (ascii 32).
A space is not a valid quality value and causes bam12auxmerge to fail. It is an interesting edge case.
After some discussion we suggest the following as a simple fix. The code which clips adapter already checks for reads that are 100% adapter, in which case it will not clip the read. Can this check be extended to also skip reads that would contain only one base after clipping.
That looks like the simplest fix. I will get that done.
If after clipping the sequence in a bam file has a single base with a quality of 9, which corresponds to a '*' in the sam format, bamclipreinsert doesn't restore the quality values properly
I've created two bam files, containing a single cluster, which illustrate the problem. input2.bam in which the clipped forward read has a single base with a quality value of 9 and fixed2.bam where the quality value has been changed to 10.
Compare bamclipreinsert < input2.bam | samtools view bamclipreinsert < fixed2.bam | samtools view
We actually noticed this error because bam12auxmerge failed with the error 'Invalid quality value', when we tried re-inserting the clipped bases. The sam representation of the resulting bam file is somewhat different to that produced by bamclipreinsert but I assume the cause is the same.