Open mathause opened 5 months ago
Related issues:
Hello,
I started digging into this but kind of get lost into the code. So, I changed approach. I assume that the weighted reductions behaviours should be aligned as much as possible on the regular non-weighted reductions, which is currently not the case. So somehow, the logic has to be aligned, potentially refactored away? It seems this family of bugs (or better said, unexpected behaviours) root for diverging implementations and lack of delegation to default existing logic. Of course, this is easier said than done!
I have written a test describing common scenarios, comparing non-weighted (regular) reductions vs weighted reductions You can find the code of the test (test_non_weighted_vs_weighted_behaviour
) here for more details: https://github.com/etienneschalk/xarray/blob/ab0488caf3e81e37e02beeb709cf0c46f72a7c83/xarray/tests/test_weighted.py#L916 (:warning: this is work in progress, code not clean, experimenting with breaking things!)
Here is the matrix of behaviour summarizing the testing approach:
The "Weighted (expect)" column is identical to "Non-weighted (reference)", I put it to emphasizes that the behaviours should match, but maybe not 100% in case subtle unavoidable variations are discovered? For now I made the assumption that it should be a perfect match.
Scenario ID / Behaviour | Non-weighted (reference) | Weighted (expected) | Weighted (current) | Is current aligned on reference? |
---|---|---|---|---|
(1) DataArrayxr.DataArray(1) |
:x: r"'x' not found in array dimensions ()" (Handled on DataArray level) |
:x: r"'x' not found in array dimensions ()" (Handled on DataArray level) |
:x: r"'x' not found in array dimensions ()" (Handled on DataArray level) |
:white_check_mark: |
(2) Dataset without reduction dimxr.Dataset({"var": 1}) |
:x: r"Dimensions ('x',) not found in data dimensions ()" (Handled on Dataset level) |
:x: r"Dimensions ('x',) not found in data dimensions ()" (Handled on Dataset level) |
:x: r"'x' not found in array dimensions ()" (Handled on DataArray level) |
:x: |
(3) Dataset with reduction dimxr.Dataset({"var": 1, "x_dependant": ("x", [2, 4])}) |
:white_check_mark: (Handled on Dataset level) |
:white_check_mark: (Handled on Dataset level) |
:x: r"'x' not found in array dimensions ()" (Handled on DataArray level) |
:x: |
So, according to this table, the issue is that the weighted behaviour for Dataset is not as specialized than the non-weighted one. Indeed, the Dataset non-weighted dimensions allow to ignore a reduction for a variable that does not depends on the reducing dimension. The only check seems to be that the reduced Dataset should at least contain one variable dependant on the reducing dimension. This "lenient" approach seems more user friendly as it is common to have Datasets with variables of various dimensions, so ignoring some of these variables simplifies the user experience, while reductions over DataArrays are more strict, since the user should know the dimensions of the DataArray that is being reduced (in opposition to plenty of variables in a Dataset).
The question now would be: how to align weighted reductions with their non-weighted counterparts, so that "UX choices" are made only once (for non-weighted that is the reference)?
What happened?
ds.weighted(weights).mean(dims)
errors when reducing over a dimension that is neither on theweights
nor on the variable.What did you expect to happen?
This used to work and was "broken" by #8606. However, we may want to fix this by ignoring (?) those data vars instead (#7027).
Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
Anything else we need to know?
No response
Environment
Newest main (i.e. 2024.01)