samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
666 stars 240 forks source link

Incorrect number of AD fields (2) #711

Closed JRodrigoF closed 6 years ago

JRodrigoF commented 6 years ago

Hi,

A (wrong?) behaviour with bcftools merge when dealing with either ref or missing genotypes such as the below example:

File 1 extract:

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT V00055 V00082

Y 2600000 . N . . . AN=0 GT:AD:DP:RGQ .:0,0:0:0 .:0,0:0:0

File 2 extract:

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT GRC001622 GRC004194

Y 2600000 . N . . . AN=0 GT:AD:DP:RGQ .:0,0:0:0 .:0,0:0:0

~/bcftools-1.6/bcftools merge FILE_1.vcf.gz FILE_2.vcf.gz

fileformat=VCFv4.2

FILTER=

ALT=

FILTER=

FILTER=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

INFO=

INFO=

INFO=

INFO=

reference=file:///mmg/jflores/resources/human_reference/Homo_sapiens_assembly19.fasta

contig=

bcftools_mergeVersion=1.6+htslib-1.6

bcftools_mergeCommand=merge FILE_1.vcf.gz FILE_2.vcf.gz; Date=Fri Nov 10 17:18:38 2017

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT V00055 V00082 GRC001622 GRC004194

Incorrect number of AD fields (2) at Y:2600000, cannot merge.

In contrast, vcf-merge behaves as expected, if I'm correct:

vcf-merge FILE_1.vcf.gz FILE_2.vcf.gz

source_20171110.1=vcf-merge(v0.1.14-12-gcdb80b8) FILE_1.vcf.gz FILE_2.vcf.gz

sourceFiles_20171110.1=0:FILE_1.vcf.gz,1:FILE_2.vcf.gz

INFO=

INFO=

INFO=

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT V00055 V00082 GRC001622 GRC004194

Y 2600000 . N . . . AC=0;AN=0;SF=0,1 GT:DP:RGQ:AD .:0:0:0,0 .:0:0:0,0 .:0:0:0,0 .:0:0:0,0

Please let me know your observations, Best, Rodrigo

JRodrigoF commented 6 years ago

FILE_1.vcf.gz FILE_2.vcf.gz

pd3 commented 6 years ago

This is expected behavior. The AD tag is defined as Number=R in the header, therefore for records with REF=N and ALT=. there is only one allele and there should be only one value, that is AD=0, not AD=0,0.