Closed ksw9 closed 5 years ago
The basic assumption of the bcftools merge
tool is that the input VCFs represent variants with respect to the same reference. In other words, the REF columns must match. Try bcftools norm --check-ref e --fasta-ref <ref.fa>
to check that this is the case. With bcftools norm -c x
it is possible to exclude mismatching records. However, you should make sure you understand the reasons of the discrepancies to prevent random nonsense results.
Fantastic, thank you for your help! It turns out I was using bcftools norm with an older version of bcftools. Problem resolved. Thank you for the quick help! Best,
Hi, I've just realized that an error persists for me when I use bcftools norm. When I use bcftools norm, I find strange behavior around an indel. This changes the positions in such a way that position 29483 comes before 29482. Any suggestions for how to deal with this would be great. Thank you!
Versions: bcftools 1.9-105-gaf6f0c9 Using htslib 1.9-118-g2da4c7d
# Original file
NC_000962.3 26960 . C . . . END=29483;MinDP=2 GT:DP 0:2
NC_000962.3 29484 . AA A 228 . AC=1;ADF=5,57;ADR=2,44;AN=1;DP=108;DP4=5,2,57,44;IDV=101;IMF=0.935185;INDEL;MQ=60;MQ0F=0;MQSB=1;QD=2.11;SGB=-0.693147;VDB=0.997587 GT:AD:DP:PL:SP 1:7,101:108:255,0:2
# After ${BCFTOOLS} norm --fasta-ref ${ref} ${ann_vcf} -O z -o ${norm_vcf}
# Lines total/split/realigned/skipped: 3169/0/150/0
NC_000962.3 26960 . C . . . END=29483;MinDP=2 GT:DP 0:2
NC_000962.3 29482 . CA C 228 . AC=1;ADF=5,57;ADR=2,44;AN=1;DP=108;DP4=5,2,57,44;IDV=101;IMF=0.935185;INDEL;MQ=60;MQ0F=0;MQSB=1;QD=2.11;SGB=-0.693147;VDB=0.997587 GT:AD:DP:PL:SP 1:7,101:108:255,0:2
This is the intended behavior, the program trims and left-aligns indels.
Okay, thanks - is there then a best practice for how to resolve the issue that the homozygous - ref block ends after the next indel? Ie the block from 26960 ends at 29483 and the next variant is at 29482? Because using this VCF, I am unable to merge and get errors that the VCF is out of order.
Thank you!
The result from bcftools norm
should be sorted. If not, it should be considered a bug. However, I assume bcftools
are not to be blamed for the unsorted records given the other discrepancies in the data you reported above. You can use bcftools sort
to sort the records.
Excellent, thanks again for helping me sort out this problem.
@ksw9 just checking if you remember how did you fix it? I tried with both --check-ref s --fasta-ref <ref.fa>
and --check-ref x --fasta-ref <ref.fa>
, but it gives me the following error, and the merged file contains only variants from Chr1.
Warning: The row size is too big for FORMAT/PL at chr1:23899810, requires 12204615120 bytes, skipping.
The REF prefixes differ: C vs N (1,1)
Failed to merge alleles at chr1:53128382 in HG01369.vcf.gz
Lines total/split/realigned/skipped: 62437/4488/0/0
REF/ALT total/modified/added: 88924/0/688
Hi, I am trying to merge VCFs across samples with bcftools merge and receiving the error: The REF prefixes differ: AGGGG vs ACGGCGGC (5,8). Indeed, in the single sample calls, the REF alleles do differ. Is there a best approach to dealing with this? I ran bcftools norm before attempting to merge the VCFs.
Thank you in advance!
Version information: bcftools 1.9-105-gaf6f0c9 Using htslib 1.9-118-g2da4c7d