Closed 46paillettes closed 5 months ago
As far as I can see, the program is doing the right thing. The alignment reconstructed from the VCF and the alignment shown in the IGV plot lead to the exact same haplotype. It is not possible to tell which of the C's got deleted and which one was replaced with T.
TTCAGGGACCCCCCCCCCCCA .. ref
TTCAGGGACCT---CCCCCCA .. VCF
TTCAGGGA---CCTCCCCCCA .. screenshot
Hello,
I am using bctools to decompose variants and compare them to public data. However, I notice that in a specific case, bcftools seems to get the decomposition wrong.
Variant view in IGV:
Public sample used: NA24631
Problematic variant confirmed in public data: 19 41759631 ACCC A 19 41759637 C T
Phased representation detected in the data: 19 41759634 CCCC T
Decomposition by bcftools: 19 41759631 ACCC A 19 41759634 C T
The first INDEL is decomposed correctly. However, the SNP cannot be located at position 41759634 as it would mean that the SNP and the INDEL are not on the same allele (thus cannot be a phased variant). IMO, the SNP should be at position 41759635 (of course, the SNP can be at any position between 35 and 37 as the anchor is not defined in the phased representation).
Could you tell me if I am wrong or if there is indeed a bug ?
Thanks a lot for the help, Regards, Jonathan Bernard
Here are the files and commands I used:
test vcf (+ tabix index): test.vcf.gz
result: test_output.vcf.gz
Command used: tabix -p vcf test.vcf.gz bcftools norm test.vcf.gz -w 0 -c e -f human_g1k_v37.fasta -m +both -a --old-rec-tag OLD_REP > test_output.vcf