Use config to select implementation

zarr-developers / zarr-python

An implementation of chunked, compressed, N-dimensional arrays for Python.

http://zarr.readthedocs.io/

MIT License

1.41k stars 267 forks source link

Use config to select implementation #1982

Open brokkoli71 opened 3 weeks ago

brokkoli71 commented 3 weeks ago

fixes #1878

Using the config (https://github.com/pytroll/donfig), the user can specify now the implementation of all codecs, the CodecPipeline, Buffer and NDBuffer. For each of these objects, the codec registry can deal with multiple different implementations and will use the one selected by the config.

Further changes:

All calls on classes Buffer and NDBuffer now get called on selected Implementation
Registry was expanded to register codec-pipelines, buffers and ndbuffers
Moved registry.py from zarr.codecs.registry to zarr.registry

normanrz commented 2 weeks ago

@madsbk I was wondering what you think about overriding the default_buffer_prototype via config. Do you think that is a good idea or unneccessary?

madsbk commented 2 weeks ago

@madsbk I was wondering what you think about overriding the default_buffer_prototype via config. Do you think that is a good idea or unneccessary?

I think it would a good idea, or maybe add a default attribute to each AsyncArray instance, which would be set to the config value if not specified when creating the array?

brokkoli71 commented 2 weeks ago

@madsbk

I think it would a good idea, or maybe add a default attribute to each AsyncArray instance, which would be set to the config value if not specified when creating the array?

Do you mean to have additionally to the BufferPrototype parameter in e.g. setitem another fallback BufferPrototype stored in the AsyncArray instance which might get set upon creation of the array? So the decision of which buffer to use would be like:

prototype in setitem → prototype in AsyncArray instance → config → numpy (with "→" being the fallback if previous was not set)

madsbk commented 2 weeks ago

Yes exactly, but I see your point, it might be a bit too many fall backs :)

In any case, if we allow modification of default_buffer_prototype, I think we need an another constant like numpy_buffer_prototype that is always backed by a numpy array for internal use. E.g. when reading the shard index, we always want to use numpy: https://github.com/zarr-developers/zarr-python/blob/v3/src/zarr/codecs/sharding.py#L610

brokkoli71 commented 2 weeks ago

good point! @madsbk