samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
680 stars 240 forks source link

Remove the mpileup BAM_CREF_SKIP filter. #2281

Open jkbonfield opened 2 months ago

jkbonfield commented 2 months ago

Mpileup removes alignments using the cigar ref skip operator ("N"). This was originally added in 2011 in samtools/samtools#d1643d6 with the commit message of "fixed a bug in indel calling related to unmapped and refskip reads".

Unfortunately I don't know what that bug was, but removing the code shows it still works (at least for some data!). We need better understanding of what's going on and why it was added, so perhaps we should add a command line option to control this instead?

Fixes #2277

jkbonfield commented 2 months ago

See also #2282 as an alternative to this PR. That is probably the way to go for compatibility, but then again I doubt anyone is currently using bcftools on RNASeq data with ref-skip based alignments given all such alignments were simply discarded, so it's unlikely the change in behaviour would break any currently working pipelines.

pd3 commented 1 month ago

I believe the edited BAM_CREF_SKIP checking code https://github.com/samtools/bcftools/blob/ef8b974dee6d3bc9f56e39a75997a64a13cbcaa1/bam2bcf_indel.c#L855-L859 simply skips these reads from realignment. I am imagining a proper fix would be to split the reads into parts and deal with each separately. Unfortunately the current framework might not be bent easily for that.

If we should merge any version of this, we'd need to include a small test to demonstrate where it can be helpful.