sgkit-dev / bio2zarr

Convert bioinformatics file formats to Zarr
Apache License 2.0
26 stars 7 forks source link

Add multiple-file support to vcfpartition #252

Closed Will-Tyler closed 3 months ago

Will-Tyler commented 3 months ago

Description

This pull request adds support for multiple VCF/BCF files to the vcfpartition command and closes #212.

When the user specifies multiple VCF/BCF files, vcfpartition interprets the number of partitions argument as the total number of partitions among all the files. The partitions are distributed evenly among the files.

Let me know if we should add a section to the vcfpartition that describes how to partition multiple files.

Testing

I added some unit tests to test the changes to the vcfpartition CLI.

I tried to check the documentation changes manually by building the documentation (running make -C docs from the project directory), but I encountered this error:

rm -fR sample.vcz
asciinema-automation cast_scripts/vcf2zarr_convert.sh _static/vcf2zarr_convert.cast
make: asciinema-automation: No such file or directory
make: *** [_static/vcf2zarr_convert.cast] Error 1

I didn't spend much time trying to resolve this, but if you know the fix, that would help!

coveralls commented 3 months ago

Coverage Status

coverage: 98.843% (-0.04%) from 98.884% when pulling 21f142f66db6e350e633ca7c0a4f01e6b5f98aa3 on Will-Tyler:issue-212 into 31a593531370e4833e4a68028bbd20626bbc2e70 on sgkit-dev:main.

coveralls commented 3 months ago

Coverage Status

coverage: 98.886% (+0.002%) from 98.884% when pulling 21f142f66db6e350e633ca7c0a4f01e6b5f98aa3 on Will-Tyler:issue-212 into 31a593531370e4833e4a68028bbd20626bbc2e70 on sgkit-dev:main.

Will-Tyler commented 3 months ago

Looks like there were some ruff issues because I forgot to setup pre-commit. Should be good now.

jeromekelleher commented 3 months ago

I tried to check the documentation changes manually by building the documentation (running make -C docs from the project directory), but I encountered this error:

The docs build is very fragile, it needs a good overhaul (#238) once we've figured out a better structure for the actual documentation (#239).

coveralls commented 3 months ago

Coverage Status

coverage: 98.843% (-0.04%) from 98.884% when pulling 99d7f7f852f3bc3b7cbfdc402469d05c68aec964 on Will-Tyler:issue-212 into 31a593531370e4833e4a68028bbd20626bbc2e70 on sgkit-dev:main.

jeromekelleher commented 3 months ago

Docs failure is because of numpy 2.0 issues, these should go away once you rebase.

Will-Tyler commented 3 months ago

Thanks, I just rebased this branch. I would like to merge this pull request before #253.

coveralls commented 3 months ago

Coverage Status

coverage: 98.843% (-0.04%) from 98.884% when pulling 6a573a6b7d855e7b633a1898473540058495b1d5 on Will-Tyler:issue-212 into a75091eea82c693e6d92d1ee8ecc3371f48817e2 on sgkit-dev:main.