TomNicholas commented 3 years ago

This builds on #49 by adding a pretty comprehensive set of tests of different chunking arrangements.

There are some normal tests, and some tests that use the Hypothesis library to try out all sorts of different chunk shapes (inspired by @rabernat 's similar test in the rechunker library).

There are some failures, but I think that they are because sometimes dask decides it knows better than me and changes the chunks:

  /home/tegn500/Documents/Work/Code/xhistogram/xhistogram/core.py:334: 
  PerformanceWarning: Increasing number of chunks by factor of 100
    bin_counts = dsa.blockwise(

I'm not quite sure how that causes those tests to fail though - I'm not even sure that behaviour is deterministic.

How do I turn this feature off @jrbourbeau @gjoseph92 ? Or alternatively how do I debug what happened tp cause those tests to fail?

TomNicholas commented 3 years ago

I think that they are because sometimes dask decides it knows better than me

One of the CI runs (ubuntu-latest, 3.9) has 10 PerformanceWarnings and 17 failures though, so there might also be other problems...

rabernat commented 3 years ago

Do you think the failures are implementation dependent? In other words, should I merge this branch with #49 and see if the tests fare any better? Or do you think there is a problem with the tests themselves?

TomNicholas commented 3 years ago

should I merge this branch with #49 and see if the tests fare any better?

This branch builds atop #49 so if you merge them you will only end up with exactly the same code that's here.

Even locally I don't get a consistent number of failures - I just ran the whole suite 3 times and got 17, then 19, then 18 failures. :confused:

What is consistent is that every parametrization of the the test_2d_chunks_2d_hist test fails every time, as does the test_all_chunking_patterns_2d hypothesis test. So either those tests are wrong (I don't think they are...) or they indicate a real bug in the code.

I don't know what could be causing the non-deterministic behaviour apart from the dask PerformanceWarnings - I'm putting random data in the test fixtures but the numpy random seed does get set in one of the existing tests... (test_histogram_results_1d). I'll check whether we should be setting the seed at the test module level or something, but that's the only other reason for inconsistent behaviour I can think of. (It's not the hypothesis tests that are inconsistent either, so that's not the problem.)

rabernat commented 3 years ago

This branch builds atop #49 so if you merge them you will only end up with exactly the same code that's here.

Ah thanks, I had missed that 😄

Would it be worthwhile running the tests on the old, pre-#49 code?

TomNicholas commented 3 years ago

Would it be worthwhile running the tests on the old, pre-#49 code?

I just tried that in #58 (messed up a rebase before realising I actually needed to cherry-pick), but the tests still fail. Similar test behaviour - the same tests fail, though now a lot of them fail with

xhistogram/test/test_chunking.py:156: in test_all_chunking_patterns_dd_hist
    h = histogram(*[da for name, da in ds.data_vars.items()], bins=bins)
xhistogram/xarray.py:163: in histogram
    h_data, bins = _histogram(
xhistogram/core.py:339: in histogram
    bin_counts = _histogram_2d_vectorized(
xhistogram/core.py:163: in _histogram_2d_vectorized
    bin_indices = ravel_multi_index(each_bin_indices, hist_shapes)
xhistogram/duck_array_ops.py:24: in f
    return getattr(module, name)(*args, **kwargs)
<__array_function__ internals>:5: in ravel_multi_index
    ???
../../../../miniconda3/envs/py38-mamba/lib/python3.8/site-packages/dask/array/core.py:1551: in __array_function__
    return da_func(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

multi_index = [dask.array<digitize, shape=(1, 72), dtype=int64, chunksize=(1, 1), chunktype=numpy.ndarray>, dask.array<digitize, sha... chunktype=numpy.ndarray>, dask.array<digitize, shape=(1, 72), dtype=int64, chunksize=(1, 1), chunktype=numpy.ndarray>], dims = [9, 10, 11, 12], mode = 'raise', order = 'C'

    @wraps(np.ravel_multi_index)
    def ravel_multi_index(multi_index, dims, mode="raise", order="C"):
>       return multi_index.map_blocks(
            _ravel_multi_index_kernel,
            dtype=np.intp,
            chunks=(multi_index.shape[-1],),
            drop_axis=0,
            func_kwargs=dict(dims=dims, mode=mode, order=order),
        )
E       AttributeError: 'list' object has no attribute 'map_blocks'

../../../../miniconda3/envs/py38-mamba/lib/python3.8/site-packages/dask/array/routines.py:1763: AttributeError

rabernat commented 3 years ago

That error is https://github.com/xgcm/xhistogram/issues/27#issuecomment-823790286

TomNicholas commented 3 years ago

I've opened a dask issue to ask about the PerformanceWarnings.

That error is #27 (comment)

Hmm - I guess I could pin my local environment to dask=2021.02.0 to see if the tests pass then... (EDIT: that did not work - same errors)

TomNicholas commented 3 years ago

Thanks @jrbourbeau , that silences the warning, but unfortunately doesn't fix the failures, and the failures are still inconsistent! :sob:

jrbourbeau commented 3 years ago

I also see flaky tests when trying this PR out locally. FWIW the pytest-repeat plugin is a nice way to trigger a flaky test by running it several times. For example, pytest -v xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks -x --count=20 (the --count=20 part is where pytest-repeat comes in) consistently triggers a failure for me locally.

Interestingly, the failure for this particular test has to do with the dataarray_factory utility (see the traceback below) and not the actual histogramming logic (or rather the test isn't getting to the histogram logic yet)

Full traceback:

``` (xhistogram) ➜ xhistogram git:(chunk_tests) ✗ pytest -v xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks -x --count=20 ===================================================================== test session starts ====================================================================== platform darwin -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1 -- /Users/james/miniforge3/envs/xhistogram/bin/python3.8 cachedir: .pytest_cache hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/Users/james/projects/xgcm/xhistogram/.hypothesis/examples') rootdir: /Users/james/projects/xgcm/xhistogram, configfile: setup.cfg plugins: hypothesis-6.13.6, repeat-0.9.1 collected 160 items xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-1-20] PASSED [ 0%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-2-20] PASSED [ 1%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-3-20] PASSED [ 1%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-4-20] PASSED [ 2%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-5-20] PASSED [ 3%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-6-20] PASSED [ 3%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-7-20] PASSED [ 4%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-8-20] PASSED [ 5%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-9-20] PASSED [ 5%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-10-20] PASSED [ 6%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-11-20] PASSED [ 6%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-12-20] PASSED [ 7%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-13-20] PASSED [ 8%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-14-20] PASSED [ 8%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-15-20] PASSED [ 9%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-16-20] PASSED [ 10%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-17-20] PASSED [ 10%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-18-20] PASSED [ 11%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-19-20] PASSED [ 11%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-20-20] PASSED [ 12%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-1-20] PASSED [ 13%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-2-20] PASSED [ 13%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-3-20] PASSED [ 14%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-4-20] PASSED [ 15%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-5-20] PASSED [ 15%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-6-20] PASSED [ 16%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-7-20] PASSED [ 16%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-8-20] PASSED [ 17%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-9-20] PASSED [ 18%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-10-20] PASSED [ 18%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-11-20] PASSED [ 19%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-12-20] PASSED [ 20%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-13-20] PASSED [ 20%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-14-20] PASSED [ 21%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-15-20] PASSED [ 21%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-16-20] PASSED [ 22%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-17-20] PASSED [ 23%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-18-20] PASSED [ 23%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-19-20] PASSED [ 24%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-20-20] PASSED [ 25%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-1-20] PASSED [ 25%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-2-20] PASSED [ 26%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-3-20] PASSED [ 26%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-4-20] PASSED [ 27%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-5-20] PASSED [ 28%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-6-20] PASSED [ 28%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-7-20] PASSED [ 29%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-8-20] PASSED [ 30%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-9-20] PASSED [ 30%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-10-20] PASSED [ 31%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-11-20] PASSED [ 31%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-12-20] PASSED [ 32%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-13-20] PASSED [ 33%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-14-20] PASSED [ 33%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-15-20] PASSED [ 34%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-16-20] PASSED [ 35%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-17-20] PASSED [ 35%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-18-20] PASSED [ 36%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-19-20] PASSED [ 36%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-20-20] PASSED [ 37%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-1-20] PASSED [ 38%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-2-20] PASSED [ 38%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-3-20] PASSED [ 39%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-4-20] PASSED [ 40%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-5-20] PASSED [ 40%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-6-20] PASSED [ 41%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-7-20] PASSED [ 41%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-8-20] PASSED [ 42%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-9-20] PASSED [ 43%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-10-20] PASSED [ 43%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-11-20] PASSED [ 44%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-12-20] PASSED [ 45%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-13-20] PASSED [ 45%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-14-20] PASSED [ 46%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-15-20] PASSED [ 46%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-16-20] PASSED [ 47%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-17-20] PASSED [ 48%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-18-20] PASSED [ 48%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-19-20] PASSED [ 49%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-20-20] PASSED [ 50%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-1-20] PASSED [ 50%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-2-20] PASSED [ 51%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-3-20] PASSED [ 51%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-4-20] PASSED [ 52%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-5-20] PASSED [ 53%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-6-20] PASSED [ 53%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-7-20] PASSED [ 54%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-8-20] PASSED [ 55%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-9-20] PASSED [ 55%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-10-20] PASSED [ 56%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-11-20] PASSED [ 56%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-12-20] PASSED [ 57%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-13-20] PASSED [ 58%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-14-20] PASSED [ 58%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-15-20] PASSED [ 59%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-16-20] PASSED [ 60%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-17-20] PASSED [ 60%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-18-20] PASSED [ 61%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-19-20] PASSED [ 61%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-20-20] PASSED [ 62%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-1-20] PASSED [ 63%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-2-20] PASSED [ 63%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-3-20] PASSED [ 64%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-4-20] PASSED [ 65%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-5-20] PASSED [ 65%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-6-20] PASSED [ 66%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-7-20] PASSED [ 66%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-8-20] PASSED [ 67%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-9-20] PASSED [ 68%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-10-20] PASSED [ 68%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-11-20] PASSED [ 69%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-12-20] PASSED [ 70%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-13-20] PASSED [ 70%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-14-20] PASSED [ 71%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-15-20] PASSED [ 71%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-16-20] PASSED [ 72%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-17-20] PASSED [ 73%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-18-20] PASSED [ 73%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-19-20] PASSED [ 74%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-20-20] PASSED [ 75%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-1-20] PASSED [ 75%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-2-20] PASSED [ 76%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-3-20] PASSED [ 76%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-4-20] PASSED [ 77%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-5-20] PASSED [ 78%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-6-20] PASSED [ 78%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-7-20] PASSED [ 79%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-8-20] PASSED [ 80%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-9-20] PASSED [ 80%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-10-20] PASSED [ 81%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-11-20] PASSED [ 81%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-12-20] PASSED [ 82%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-13-20] PASSED [ 83%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-14-20] PASSED [ 83%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-15-20] PASSED [ 84%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-16-20] PASSED [ 85%] xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-17-20] FAILED [ 85%] =========================================================================== FAILURES =========================================================================== __________________________________________________________ test_fixed_size_1d_chunks[shape1-3-17-20] ___________________________________________________________ dataarray_factory = ._dataarray_factory at 0x7f9969444040>, chunksize = 3, shape = (10, 4) @pytest.mark.parametrize("chunksize", [1, 2, 3, 10]) @pytest.mark.parametrize("shape", [(10,), (10,4)]) def test_fixed_size_1d_chunks(dataarray_factory, chunksize, shape): > data_a = dataarray_factory(shape).chunk((chunksize,)) xhistogram/test/test_chunking.py:12: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ../../../miniforge3/envs/xhistogram/lib/python3.8/site-packages/xarray/core/dataarray.py:1057: in chunk ds = self._to_temp_dataset().chunk( ../../../miniforge3/envs/xhistogram/lib/python3.8/site-packages/xarray/core/dataarray.py:488: in _to_temp_dataset return self._to_dataset_whole(name=_THIS_ARRAY, shallow_copy=False) ../../../miniforge3/envs/xhistogram/lib/python3.8/site-packages/xarray/core/dataarray.py:540: in _to_dataset_whole dataset = Dataset._construct_direct(variables, coord_names, indexes=indexes) ../../../miniforge3/envs/xhistogram/lib/python3.8/site-packages/xarray/core/dataset.py:1008: in _construct_direct dims = calculate_dimensions(variables) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ variables = {: array([[ 0.31830519, 1.19267377, -0.36415368, 1.65018558], [-0.397767... [ 0.18824782, -1.29960002, 0.54894081, 0.75569833], [ 0.72254191, -0.54123615, -0.24358458, 0.91154796]])} def calculate_dimensions(variables: Mapping[Hashable, Variable]) -> Dict[Hashable, int]: """Calculate the dimensions corresponding to a set of variables. Returns dictionary mapping from dimension names to sizes. Raises ValueError if any of the dimension sizes conflict. """ dims: Dict[Hashable, int] = {} last_used = {} scalar_vars = {k for k, v in variables.items() if not v.dims} for k, var in variables.items(): for dim, size in zip(var.dims, var.shape): if dim in scalar_vars: raise ValueError( "dimension %r already exists as a scalar variable" % dim ) if dim not in dims: dims[dim] = size last_used[dim] = k elif dims[dim] != size: > raise ValueError( "conflicting sizes for dimension %r: " "length %s on %r and length %s on %r" % (dim, size, k, dims[dim], last_used[dim]) ) E ValueError: conflicting sizes for dimension 'l': length 4 on and length 10 on ../../../miniforge3/envs/xhistogram/lib/python3.8/site-packages/xarray/core/dataset.py:206: ValueError ===================================================================== slowest 10 durations ===================================================================== 0.02s call xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-1-20] 0.01s call xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-17-20] 0.01s call xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-15-20] 0.01s call xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-16-20] 0.01s call xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-20-20] 0.01s call xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-18-20] 0.01s call xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-7-20] 0.01s call xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-1-20] 0.01s call xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-14-20] 0.01s call xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-12-20] =================================================================== short test summary info ==================================================================== FAILED xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-17-20] - ValueError: conflicting sizes for dimension 'l': length 4 on

gjoseph92 commented 3 years ago

See https://github.com/dask/dask/issues/7711#issuecomment-849110461 for more, but I don't think align_arrays=False is the right thing to do here (without adding other rechunking logic to align the input arrays). I think eventually, it could be a good idea to pick the chunk pattern ourselves (so that one input array with small chunks doesn't split all the others into tiny pieces), but that should only affect performance, not correctness.

From a quick glance at the failures, it seems like there are generally 2 types of errors:

cases where the resulting histogram is different (in particular, it contains more zeros / the counts are much lower than expected)
errors inside Dataset.chunk like conflicting sizes for dimension 'n': length 12 on <this-array> and length 10 on {'n': <this-array>}.

I haven't looked carefully at these tests yet, but I can try to take a closer look soon. One thing I noticed is that:

dims = [random.choice(string.ascii_lowercase) for ax in shape]

does allow for the potential of repeated dimension names in the same array.

TomNicholas commented 3 years ago

Thanks both. This is very helpful.

I don't think align_arrays=False is the right thing to do here

Makes sense - I'll undo that now.

Looks like my dataset_factory fixture is causing at least some of the test failures.

allow for the potential of repeated dimension names

Good point! I've pushed a commit to stop that happening, and everything seems to pass locally now! :champagne:

codecov[bot] commented 3 years ago

Codecov Report

Merging #57 (6fc4161) into master (9c7c722) will increase coverage by 15.37%. The diff coverage is n/a.

@@             Coverage Diff             @@
##           master      #57       +/-   ##
===========================================
+ Coverage   81.81%   97.18%   +15.37%     
===========================================
  Files           3        2        -1     
  Lines         242      249        +7     
  Branches       68       71        +3     
===========================================
+ Hits          198      242       +44     
+ Misses         37        5       -32     
+ Partials        7        2        -5

Impacted Files	Coverage Δ
xhistogram/duck_array_ops.py
xhistogram/xarray.py	`96.42% <0.00%> (+4.90%)`	:arrow_up:
xhistogram/core.py	`97.40% <0.00%> (+18.05%)`	:arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 9c7c722...6fc4161. Read the comment docs.

rabernat commented 3 years ago

Coverage 96.61% <0.00%> (+5.08%) 😄 😄 😄

TomNicholas commented 3 years ago

Yep, I've just fixed them. (flake8 didn't like my fixtures though so I did just have to stick a #noqa on the whole test_chunking file)

TomNicholas commented 3 years ago

I don't actually think the tests are complete yet though - there should also be tests targeting dask arrays of weights and bins.

rabernat commented 3 years ago

dask arrays of weights and bins.

Weights yes. Bins no. I think we want to always require bins to be in-memory.

TomNicholas commented 3 years ago

Am I missing something, or are the Hypothesis tests gone now?

@gjoseph92 I moved them to another file to avoid a linting error with the hypothesis import, but forgot to git add that file before committing at the end of the day yesterday!

Thanks everyone for their comments - I think I've addressed them all. I've also turned the fixtures into normal functions, and finally I added a test for chunked weights.

One question is whether it would be a good idea to have a test for input arrays with unaligned chunks?

gjoseph92 commented 3 years ago

@TomNicholas I definitely think you should test with unaligned chunks, in both the inputs and the weights.

TomNicholas commented 3 years ago

gitwit

I'm stealing that haha

What's the right way to merge this PR into master?

Good question - apparently you used to have to make a new local branch and push that as a new PR, but now github allows me (or you probably as a maintainer) to edit the target branch directly.

xgcm / xhistogram

Test chunking (including Hypothesis tests) #57

Codecov Report