Open timothymillar opened 2 weeks ago
Related issue around supporting partial phasing in the VCF-Zarr spec: https://github.com/sgkit-dev/vcf-zarr-spec/issues/24
I think we'll need to wait on htslib and cyvcf2 support for this - presumably it'll be a while coming through the pipeline. I had a quick scan of the htslib issue tracker but didn't find anything.
What does bcftools view
give for this VCF @timothymillar?
Good point, I don't think we can do anything for now. With the VCF:
##fileformat=VCFv4.4
##FILTER=<ID=PASS,Description="All filters passed">
##fileDate=20201009
##source=.
##reference=./simple.fasta
##contig=<ID=CHR1,length=60>
##contig=<ID=CHR2,length=60>
##contig=<ID=CHR3,length=60>
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of samples with data">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Total number of alternate alleles in called genotypes">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE1 SAMPLE2 SAMPLE3
CHR1 2 . A T 60 PASS NS=3;AC=3 GT /1/1 /0/0 /0/0
CHR1 7 . A C 60 PASS NS=3;AC=4 GT /0/0 /0/1 /0/1
bcftools view
(version 1.20) omits all of the records (nothing after #CHROM ...
bcftools view
(version 1.10.2) inserts an additional reference allele:
...
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE1 SAMPLE2 SAMPLE3
CHR1 2 . A T 60 PASS NS=3;AC=3 GT 0/1/1 0/0/0 0/0/0
CHR1 7 . A C 60 PASS NS=3;AC=4 GT 0/0/0 0/0/1 0/0/1
Hmm - that's not a great sign. I don't think this feature is going to get used much for a while.
The VCF 4.4 spec now allows for an initial symbol indicating the phasing of the first allele. For example,
/0/1
is a valid genotype. At present, vcf2zarr is raising on this input withCouldn't read GT data: value not a number or '.' ...
.