sgkit-dev / bio2zarr

Convert bioinformatics file formats to Zarr
Apache License 2.0
23 stars 5 forks source link

Change dexplode-init to use ``--num-parts``/``-n`` instead of positional #243

Closed jeromekelleher closed 1 month ago

jeromekelleher commented 1 month ago

Makes more sense to do it like vcf-partition, and because then you can specify to do it by VCF file chunk size as well.

So, we'd use

vcf2zarr dexplode-init sample.vcf.gz sample-dist.icf -n 5

We don't need to implement the by file size bit yet, but can register it as something to add later.

jeromekelleher commented 1 month ago

Another option would be to partition the file maximally, if --num-partitions isn't specified.