sgkit-dev / vcztools

Partial reimplementation of bcftools for VCF Zarr
Apache License 2.0
3 stars 3 forks source link

Add query list samples command #58

Closed Will-Tyler closed 2 months ago

Will-Tyler commented 2 months ago

Overview

This pull request implements the bcftools query --list-samples command from bcftools in vcztools. The vcztools command loads the sample IDs from the VCF Zarr group and prints them.

This pull request closes #51.

Example usage

vcztools query -l vcz_test_cache/sample.vcf.vcz
NA00001
NA00002
NA00003
vcztools query --list-samples vcz_test_cache/sample.vcf.vcz
NA00001
NA00002
NA00003

Testing

I added some tests that compare bcftools' output with vcztools' output along with some unit tests. The code introduced in this pull request has good coverage.

~The coverage tool shows no coverage on the code added in this PR because the tests run the code in a different process. Most of the VCF writer code is also not covered because of this testing approach.~

References

tomwhite commented 2 months ago

The coverage tool shows no coverage on the code added in this PR because the tests run the code in a different process.

I've been wondering what our testing strategy should be, and this is one reason to add unit tests that call functions directly (i.e. not via the CLI). In this case it should just be a matter of adding a unit test for list_samples.

I think we need validation tests too of course, but these are less focused on exercising test coverage and more about checking that we match bcftools across a wide range of CLI parameters. So in this case I think it's fine to keep the tests you've already added as well.

Most of the VCF writer code is also not covered because of this testing approach.

I think a lot of the missing VCF writer coverage is because we are not exercising VCF header generation yet, but that's a bigger topic - see #47 and related issues.