Closed cflerin closed 3 years ago
Hi @cflerin, I finally got some time to look into this. You're right about the soft clippling, I honestly don't know why I added that but it seems to be a mistake. I've fixed it now, thanks for raising the issue
Hi @timoast , thanks for looking at this. After posting this here, I also discovered that cellranger atac does the same adjustment for soft clipping in the fragments file. So I'm not totally sure why that is. But it does seem like the soft clipping adjustment isn't technically correct according to the sam spec, so probably best that it's not done. Thanks for the fix.
Yes, I think that's what I was looking at originally when writing this and copied over the same logic. But I agree that according to the SAM spec, POS
should be the reference genome position of the first aligned base, so we shouldn't need to adjust for soft-clipped bases.
Hi @timoast ,
I have spotted a few cases where there are negative positions listed in the fragments file, especially in chrM, I think due the higher chance of finding reads overlapping the start/end of this chromosome. I tried to make a minimal example here:
Using the reads in this BAM file:
I run
sinto fragments
and get:This read aligns to chrM position 1 and is soft-clipped by 32 bases (cigar: 32S21M). Looking at the code, I see there is correction for soft-clipping: https://github.com/timoast/sinto/blob/b57d735b78dc0742eccb4a4ba801a24f577fca35/sinto/fragments.py#L330-L338 and from this it makes sense how we get to -28 (0 - 32 + 4), but I don't quite understand why this correction is applied. I thought that the position reported in the bam is the start of the mapped portion of the read, which already takes any soft-clipped portions into account. Looking at this read in IGV, and using
bamtools bamtobed
seems to confirm this (though without the Tn5 offset).Looking at another soft-clipped read, this time in the middle of chr1:
bam:
fragments:
where it seems like the start position should be 3012666 + 4 = 3012670. Am I missing something about the soft-clipping correction?