Design of an inherited xarray duck array test suite

TomNicholas commented 1 month ago

Hi @zac-HD and @tomwhite :wave:

@keewis and I spent today trying to design a duck array test suite for xarray using hypothesis. This (temporary) repo contains the sketch of what we're trying to do, with the eventual aim being that this sort of code lives upstream in xarray and in downstream duck array libraries like pint/dask/sparse/cubed etc.

The problem we are trying to solve is that we want to create a test suite that can be used to test xarray's wrapping of any duck array type, including cubed and pint as representative examples. We want this test suite to:

Specify expected behaviour of an xarray-wrapped duck array. i.e. once complete, if the suite passes, we can be confident that xarray users can use xarray with that duck array type without issues. Note this is not the same as just checking that the duck array obeys the array API standard (there are things in xarray that aren't in the standard, and things in the standard that aren't in xarray).
Be defined upstream in xarray, so that we have control over what features are tested downstream. For example if xarray added support for scipy.skew via a new reduction method da.skew, we would want to be able to add a test to this to the test suite for duck array libraries just by changing code in the xarray repository.
Be inherited by downstream duck array libraries (either in their main packages or in glue packages such as cubed-xarray and pint-xarray). That way xarray devs don't have to maintain all the tests, and failures are first reported downstream, and it's on the devs of those packages to report any failures upstream in xarray if they think its actually xarray's fault.
Test a wide variety of cases with few lines of code. For example we want the downstream tester to just be able to import a TestDatasetReductions class and that automatically runs many different reductions on many different xarray objects.
Allow the downstream tester to specify test behaviour unique to their duck array type, including:
- array creation (e.g. you need to use the downstream package's array creation functions, and may want to parameterize over additional options such as chunking pattern)
- array result comparison (e.g. pint requires calling .magnitude, and cubed/dask requires calling .compute)
Allow the downstream tester to mark certain parameter combinations as expected failures. For example cubed.mean() currently doesn't support taking means of integers (because it's not actually required by the array API standard), but xarray.DataArray.mean() expects this to be possible. So we want to the test suite to test means of integers but give cubed's downstream tests the opportunity to mark that case as an expected failure.

(4) is the reason why we are using hypothesis, and why we made the hypothesis strategies for generating arbitrary xarray.Variable objects.

(4), (5) and (6) are the most difficult parts of this to achieve simultaneously. We need a lot of control over test cases upstream in xarray, but also give a lot of control to the downstream tester to override things.

We would appreciate it if you could take a look at what we have done (both in main and in #4) and tell us if you think we are headed in a good direction or not?

xref https://github.com/pydata/xarray/pull/4972 https://github.com/pydata/xarray/pull/4972 https://github.com/pydata/xarray/pull/6908 https://github.com/cubed-dev/cubed-xarray/issues/20

tomwhite commented 1 month ago

+1 for the direction.

Presumably the code in #4 will eventually end up in cubed-xarray, but you are keeping it here while the design evolves?

keewis commented 1 month ago

yes, indeed, it helps to have multiple examples while writing up the actual tests. So far we also expect to move to xarray and archive this repository once we're confident that the structure of the testing framework is okay.

By the way, the tests in #4 already exposed some issues in cubed / cubed-xarray (not sure which, could also be xarray).

xarray-contrib / xarray-array-testing

Design of an inherited xarray duck array test suite #6