sgkit-dev / sgkit

Scalable genetics toolkit
https://sgkit-dev.github.io/sgkit
Apache License 2.0
227 stars 32 forks source link

Don't gate IO libraries by default #494

Open hammer opened 3 years ago

hammer commented 3 years ago

Given the presence of wheels for all 3 of our upstream IO libraries, I think it makes sense to favor convenience now and have pip install sgkit pull in the IO libraries by default.

jeromekelleher commented 3 years ago

I'd vote for holding off on this until we have some conda packages or insight into how hard this is going to be. I agree it's an unpleasant hack, but we live a world of unpleasant hacks.

Plus, we'll make life miserable/impossible for Windows users if don't gate these libraries. I don't think we should write off Windows users, there's plenty of things people can do with sgkit without needing access to VCF/plink etc files.

tomwhite commented 3 years ago

we'll make life miserable/impossible for Windows users if don't gate these libraries

Plink and bgen already work on Windows (and have very small wheels), so it's only VCF that's an issue. It should be possible to put all the IO libraries in the top-level namespace and either make the VCF ones not present or raise exceptions on Windows.

BTW I think there is a workaround for VCF on Windows, which would be use scikit-allel's vcf_to_zarr function followed by sgkit's read_vcfzarr function - although I haven't tried it.

jeromekelleher commented 3 years ago

OK, but let's see if we can get a conda package build before we start changing stuff. There will be headaches, I'm sure of it.

I'll start the ball rolling there after we get the next release out?

tomwhite commented 3 years ago

There will be headaches, I'm sure of it.

😄

I'll start the ball rolling there after we get the next release out?

So the next release should not be a pre-release, right?

jeromekelleher commented 3 years ago

So the next release should not be a pre-release, right?

Actually, no, it can be either. It turns out that we can have pre-releases packages it's just that conda is hopeless with differentiating them from non-prereleases. But, if we only have pre-releases up there, then it should be fine.

So, the general rule is "don't make pre-release conda packages", but in this case it'll be fine for us.

jeromekelleher commented 3 years ago

Probably the path of least resistance for us will be to make a bioconda package, which seems like the right thing to do given the dependencies. Windows users can use the pip package, which is fine I think.