Closed TomNicholas closed 3 years ago
I think that they are because sometimes dask decides it knows better than me
One of the CI runs (ubuntu-latest, 3.9) has 10 PerformanceWarnings
and 17 failures though, so there might also be other problems...
Do you think the failures are implementation dependent? In other words, should I merge this branch with #49 and see if the tests fare any better? Or do you think there is a problem with the tests themselves?
should I merge this branch with #49 and see if the tests fare any better?
This branch builds atop #49 so if you merge them you will only end up with exactly the same code that's here.
Even locally I don't get a consistent number of failures - I just ran the whole suite 3 times and got 17, then 19, then 18 failures. :confused:
What is consistent is that every parametrization of the the test_2d_chunks_2d_hist
test fails every time, as does the test_all_chunking_patterns_2d
hypothesis test. So either those tests are wrong (I don't think they are...) or they indicate a real bug in the code.
I don't know what could be causing the non-deterministic behaviour apart from the dask PerformanceWarnings
- I'm putting random data in the test fixtures but the numpy random seed does get set in one of the existing tests... (test_histogram_results_1d
). I'll check whether we should be setting the seed at the test module level or something, but that's the only other reason for inconsistent behaviour I can think of. (It's not the hypothesis tests that are inconsistent either, so that's not the problem.)
This branch builds atop #49 so if you merge them you will only end up with exactly the same code that's here.
Ah thanks, I had missed that π
Would it be worthwhile running the tests on the old, pre-#49 code?
Would it be worthwhile running the tests on the old, pre-#49 code?
I just tried that in #58 (messed up a rebase before realising I actually needed to cherry-pick), but the tests still fail. Similar test behaviour - the same tests fail, though now a lot of them fail with
xhistogram/test/test_chunking.py:156: in test_all_chunking_patterns_dd_hist
h = histogram(*[da for name, da in ds.data_vars.items()], bins=bins)
xhistogram/xarray.py:163: in histogram
h_data, bins = _histogram(
xhistogram/core.py:339: in histogram
bin_counts = _histogram_2d_vectorized(
xhistogram/core.py:163: in _histogram_2d_vectorized
bin_indices = ravel_multi_index(each_bin_indices, hist_shapes)
xhistogram/duck_array_ops.py:24: in f
return getattr(module, name)(*args, **kwargs)
<__array_function__ internals>:5: in ravel_multi_index
???
../../../../miniconda3/envs/py38-mamba/lib/python3.8/site-packages/dask/array/core.py:1551: in __array_function__
return da_func(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
multi_index = [dask.array<digitize, shape=(1, 72), dtype=int64, chunksize=(1, 1), chunktype=numpy.ndarray>, dask.array<digitize, sha... chunktype=numpy.ndarray>, dask.array<digitize, shape=(1, 72), dtype=int64, chunksize=(1, 1), chunktype=numpy.ndarray>], dims = [9, 10, 11, 12], mode = 'raise', order = 'C'
@wraps(np.ravel_multi_index)
def ravel_multi_index(multi_index, dims, mode="raise", order="C"):
> return multi_index.map_blocks(
_ravel_multi_index_kernel,
dtype=np.intp,
chunks=(multi_index.shape[-1],),
drop_axis=0,
func_kwargs=dict(dims=dims, mode=mode, order=order),
)
E AttributeError: 'list' object has no attribute 'map_blocks'
../../../../miniconda3/envs/py38-mamba/lib/python3.8/site-packages/dask/array/routines.py:1763: AttributeError
I've opened a dask issue to ask about the PerformanceWarnings
.
That error is #27 (comment)
Hmm - I guess I could pin my local environment to dask=2021.02.0
to see if the tests pass then... (EDIT: that did not work - same errors)
Thanks @jrbourbeau , that silences the warning, but unfortunately doesn't fix the failures, and the failures are still inconsistent! :sob:
I also see flaky tests when trying this PR out locally. FWIW the pytest-repeat
plugin is a nice way to trigger a flaky test by running it several times. For example, pytest -v xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks -x --count=20
(the --count=20
part is where pytest-repeat
comes in) consistently triggers a failure for me locally.
Interestingly, the failure for this particular test has to do with the dataarray_factory
utility (see the traceback below) and not the actual histogramming logic (or rather the test isn't getting to the histogram logic yet)
See https://github.com/dask/dask/issues/7711#issuecomment-849110461 for more, but I don't think align_arrays=False
is the right thing to do here (without adding other rechunking logic to align the input arrays). I think eventually, it could be a good idea to pick the chunk pattern ourselves (so that one input array with small chunks doesn't split all the others into tiny pieces), but that should only affect performance, not correctness.
From a quick glance at the failures, it seems like there are generally 2 types of errors:
Dataset.chunk
like conflicting sizes for dimension 'n': length 12 on <this-array> and length 10 on {'n': <this-array>}
.I haven't looked carefully at these tests yet, but I can try to take a closer look soon. One thing I noticed is that:
dims = [random.choice(string.ascii_lowercase) for ax in shape]
does allow for the potential of repeated dimension names in the same array.
Thanks both. This is very helpful.
I don't think align_arrays=False is the right thing to do here
Makes sense - I'll undo that now.
Looks like my dataset_factory
fixture is causing at least some of the test failures.
allow for the potential of repeated dimension names
Good point! I've pushed a commit to stop that happening, and everything seems to pass locally now! :champagne:
Merging #57 (6fc4161) into master (9c7c722) will increase coverage by
15.37%
. The diff coverage isn/a
.
@@ Coverage Diff @@
## master #57 +/- ##
===========================================
+ Coverage 81.81% 97.18% +15.37%
===========================================
Files 3 2 -1
Lines 242 249 +7
Branches 68 71 +3
===========================================
+ Hits 198 242 +44
+ Misses 37 5 -32
+ Partials 7 2 -5
Impacted Files | Coverage Ξ | |
---|---|---|
xhistogram/duck_array_ops.py | ||
xhistogram/xarray.py | 96.42% <0.00%> (+4.90%) |
:arrow_up: |
xhistogram/core.py | 97.40% <0.00%> (+18.05%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Ξ = absolute <relative> (impact)
,ΓΈ = not affected
,? = missing data
Powered by Codecov. Last update 9c7c722...6fc4161. Read the comment docs.
Coverage 96.61% <0.00%> (+5.08%)
π π π
Yep, I've just fixed them. (flake8 didn't like my fixtures though so I did just have to stick a #noqa
on the whole test_chunking
file)
I don't actually think the tests are complete yet though - there should also be tests targeting dask arrays of weights and bins.
dask arrays of weights and bins.
Weights yes. Bins no. I think we want to always require bins to be in-memory.
Am I missing something, or are the Hypothesis tests gone now?
@gjoseph92 I moved them to another file to avoid a linting error with the hypothesis import, but forgot to git add that file before committing at the end of the day yesterday!
Thanks everyone for their comments - I think I've addressed them all. I've also turned the fixtures into normal functions, and finally I added a test for chunked weights.
One question is whether it would be a good idea to have a test for input arrays with unaligned chunks?
@TomNicholas I definitely think you should test with unaligned chunks, in both the inputs and the weights.
gitwit
I'm stealing that haha
What's the right way to merge this PR into master?
Good question - apparently you used to have to make a new local branch and push that as a new PR, but now github allows me (or you probably as a maintainer) to edit the target branch directly.
This builds on #49 by adding a pretty comprehensive set of tests of different chunking arrangements.
There are some normal tests, and some tests that use the Hypothesis library to try out all sorts of different chunk shapes (inspired by @rabernat 's similar test in the rechunker library).
There are some failures, but I think that they are because sometimes dask decides it knows better than me and changes the chunks:
I'm not quite sure how that causes those tests to fail though - I'm not even sure that behaviour is deterministic.
How do I turn this feature off @jrbourbeau @gjoseph92 ? Or alternatively how do I debug what happened tp cause those tests to fail?