walaj / SeqLib

C++ htslib/bwa-mem/fermi interface for interrogating sequence data
http://bioinformatics.oxfordjournals.org/content/early/2016/12/21/bioinformatics.btw741.full.pdf+html
Other
133 stars 36 forks source link

AlignmentPosition() is wrong with hard clips #46

Closed Ahhgust closed 5 years ago

Ahhgust commented 5 years ago

There's a bug in the AlignmentPosition code in the BamRecord object (code below). Hard clipping doesn't impact the start position in the read (removing the other half of the or statement should make the code correct)-- those bases are just missing from the reported read.

e.g., I have a read that looks like: 8:32868238-32874773_1151_1695_2:0:0_3:0:0_3cb 2048 1 1151 0 90H60M 0 -1 0 ACAGCTTAAAACTCAAAGGACTTGGCAGTGGTTTATATCCCTCTAGAGGAGCCTGTTCTA

For which the bamrecord AlignmentPosition() and AlignmentPositionEnd() is 90 and 60, respectively.

/* Get the start of the alignment on the read, by removing soft-clips / inline int32_t AlignmentPosition() const { uint32_t* c = bam_get_cigar(b); int32_t p = 0; for (size_t i = 0; i < b->core.n_cigar; ++i) { // Remove the code after the || if ( (bam_cigar_opchr(c[i]) == 'S') || (bam_cigar_opchr(c[i]) == 'H')) p += bam_cigar_oplen(c[i]); else // not a clip, so stop counting break; } return p; }

Ahhgust commented 5 years ago

On second thought, it's (theoretically) possible to have both hard and soft clipping on the same end of the read. I don't know why you'd do that, but perhaps: if ( (bam_cigar_opchr(c[i]) == 'S') || (bam_cigar_opchr(c[i]) == 'H')) p += bam_cigar_oplen(c[i]);

should be

if ( (bam_cigar_opchr(c[i]) == 'S') p += bam_cigar_oplen(c[i]); else if ( bam_cigar_opchr(c[i]) != 'H') break;

walaj commented 5 years ago

Good catch. I agree that this is the right solution -- just pushed the commit.

Ahhgust commented 5 years ago

Thanks! It looks like different flavors of the same bug are found elsewhere. Similar patches are needed at lines: 612 627 656 Thanks! -August