Open danielballan opened 5 years ago
I haven't had a chance to investigate the failure
The subclasses that override _get_schema
override _get_schema
in the base class DataSourceMixin
without calling super()
, so self._chunks
is never defined. It looks like there is a fair amount of copy paste between the base class and its subclasses, so the easiest fix might be to remove that and use super()
. Can't get to this today, but can revisit later this week.
This fix affects access via the server.
The client side constructs an
xarray.Dataset
backed by dask arrays with some chunking. When it loads data, it requests partitions specified by a variable name and a block "part", as in('x', 0, 0, 1)
.If, on the server side, the
DataSourceMixin
subclass is holding a plain numpy array, not a dask array, then it ignores the "part" and always sends the whole array for the requested variable.On the client side, this manifests as a mismatch between the dask array's shape (the shape of the data it is expected) and the shape of the numpy array that it receives, leading to errors like
where data that arrives is larger than the data expected.
I expect it's worth refining this to make it more efficient before merging, and it needs a test. This is just a request for comments and suggestions.