sgkit-dev / vcztools

Partial reimplementation of bcftools for VCF Zarr
Apache License 2.0
1 stars 3 forks source link

Filtering with samples not outputing info fields correctly #75

Closed Will-Tyler closed 2 weeks ago

Will-Tyler commented 2 weeks ago

Description

There is a difference between the vcztools' output and bcftools' output in the following commands:

vcztools view -s NA00001 vcz_test_cache/sample.vcf.vcz
bcftools view -s NA00001 tests/data/vcf/sample.vcf.gz

vcztools does not appear to write the INFO fields correctly.

This issue also affects #67.

Root cause

vcztools view does not recalculate the AC and AN INFO fields likebcftools view does when the user specifies a sample selection.

References

Will-Tyler commented 2 weeks ago

bcftools introduces new INFO fields when using the sample selection option.

...
##bcftools_viewCommand=view tests/data/vcf/sample.vcf.gz; Date=Tue Sep  3 16:51:39 2024
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  NA00001 NA00002 NA00003
19      111     .       A       C       9.6     .       .       GT:HQ   0|0:10,15       0|0:10,10       0/1:3,3
19      112     .       A       G       10      .       .       GT:HQ   0|0:10,10       0|0:10,10       0/1:3,3
...
...
##bcftools_viewCommand=view -s NA00001 tests/data/vcf/sample.vcf.gz; Date=Tue Sep  3 17:01:00 2024
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  NA00001
19      111     .       A       C       9.6     .       AC=0;AN=2       GT:HQ   0|0:10,15
19      112     .       A       G       10      .       AC=0;AN=2       GT:HQ   0|0:10,10
...

According to the documentation (see --no-update option), bcftools view recalculates some of the INFO fields when the user specifies a sample selection.

Will-Tyler commented 2 weeks ago

Closing as duplicate of #45.