zarr-developers / zarr-python

An implementation of chunked, compressed, N-dimensional arrays for Python.
https://zarr.readthedocs.io
MIT License
1.53k stars 286 forks source link

`Array` / `AsyncArray` design #2497

Open d-v-b opened 5 days ago

d-v-b commented 5 days ago

The Array and Group classes have methods that are synchronous, but many of these methods rely on async code. To solve this problem, the Array and Group classes have _async_array and _async_group attributes respectively. To use Array as an example, the Array._async_array attribute is an AsyncArray that actually does all of the IO, and so many Array methods just wrap a synchronized invocation of the corresponding AsyncArray invocation.

An alternative would be a single class with a mix of sync and async methods, where the sync form invokes the async form within a blocking function.

My question for this issue is whether we like our current design with two classes. Do people find it intuitive, etc?

I'm not arguing for changing anything, but I will note that I experience a mild distaste when I look up the implementation of an array method and find that it's actually just wrapping a method defined in a completely different class, which I must then look up to figure out what's really going on. The Array class has just 1 attribute, _async_array, which does all the work.

I'm not aware of a "synchronized layer above async layer" solution that doesn't involve extensive wrapping (if someone knows of one please share it!), but from a discoverability standpoint, Array._get_selection wrapping Array._a_get_selection seems a bit better than Array._get_selection wrapping AsyncArray._get_selection, especially if, in the first case, the sync and async methods are defined right next to each other in the source code.