samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
663 stars 240 forks source link

Issue in variant decomposition #2155

Closed 46paillettes closed 5 months ago

46paillettes commented 5 months ago

Hello,

I am using bctools to decompose variants and compare them to public data. However, I notice that in a specific case, bcftools seems to get the decomposition wrong.

Variant view in IGV:

MicrosoftTeams-image

Public sample used: NA24631

Problematic variant confirmed in public data: 19 41759631 ACCC A 19 41759637 C T

Phased representation detected in the data: 19 41759634 CCCC T

Decomposition by bcftools: 19 41759631 ACCC A 19 41759634 C T

The first INDEL is decomposed correctly. However, the SNP cannot be located at position 41759634 as it would mean that the SNP and the INDEL are not on the same allele (thus cannot be a phased variant). IMO, the SNP should be at position 41759635 (of course, the SNP can be at any position between 35 and 37 as the anchor is not defined in the phased representation).

Could you tell me if I am wrong or if there is indeed a bug ?

Thanks a lot for the help, Regards, Jonathan Bernard

Here are the files and commands I used:

test vcf (+ tabix index): test.vcf.gz

result: test_output.vcf.gz

Command used: tabix -p vcf test.vcf.gz bcftools norm test.vcf.gz -w 0 -c e -f human_g1k_v37.fasta -m +both -a --old-rec-tag OLD_REP > test_output.vcf

pd3 commented 5 months ago

As far as I can see, the program is doing the right thing. The alignment reconstructed from the VCF and the alignment shown in the IGV plot lead to the exact same haplotype. It is not possible to tell which of the C's got deleted and which one was replaced with T.

TTCAGGGACCCCCCCCCCCCA  .. ref
TTCAGGGACCT---CCCCCCA  .. VCF
TTCAGGGA---CCTCCCCCCA  .. screenshot