Open jhamman opened 4 days ago
👍 this seems good to me.
I think sharding is a big enough part of what zarr v3 promises, that it's worth having crc32c as part of the default dependencies. Looking at their files on PyPI the package is very light (~40kB), and it doesn't have any other requirements.
fsspec is also small (200kB), so I wonder if it's worth keeping default too so users don't have to jump through extra hoops to open remote arrays? Given a large use case of zarr is a format for large data > a lot of the time users will be accessing it remotely.
What are the reasons for removing these? Definitely open to considering it, but given they're lightweight deps at the moment I'm thinking we should keep them as default.
I think sharding is a big enough part of what zarr v3 promises, that it's worth having crc32c as part of the default dependencies. Looking at their files on PyPI the package is very light (~40kB), and it doesn't have any other requirements.
Is there a reason why we shouldn't put sharding in numcodecs? then the crc32c dependency would live there.
👍 for that
Here's my thought on fsspec
. While I agree that the package dependency is not particularly large, it also don't come with batteries included -- you still need s3fs
, gcfs
, adlfs
, etc to use the RemoteStore
. I imagine we're all aligned on making keeping each of the individual implementations out of the required dependency tree. I guess my perspective is that if all of those are optional, and they all depend on fsspec
, then we don't gain much by requiring fsspec
.
@d-v-b and/or @dstansby - can one of you open an issue on crc32c
in numcodecs?
That makes sense to me on fsspec
- would be good to add some docs if it's optional, I'll stick a request on https://github.com/zarr-developers/zarr-python/pull/2395.
I opened an issue for cr32c
at https://github.com/zarr-developers/numcodecs/issues/610
I also think that we should only drop crc32c
as a core zarr dependency once it is part of numcodecs. It would suck if people had to install additional groups to be able to use sharding.
I'd like to open the conversation about what Zarr's core dependencies are for 3.0. Currently, this looks like:
https://github.com/zarr-developers/zarr-python/blob/11312534ebe683d73cbbcc2da9e88933cb00cc14/pyproject.toml#L25-L34
Some of these are not used anymore (
asciitree
andfasteners
) so those can safely go.Then there is
fsspec
andcrc32c
. These are only needed for theRemoteStore
andShardingCodec
, respectively. What do we think about making these optional?One proposed diff in our dependencies would look something like:
Notes: