Open 3tilley opened 3 days ago
Thanks for opening your first issue here at xarray! Be sure to follow the issue template! If you have an idea for a solution, we would really welcome a Pull Request with proposed changes. See the Contributing Guide for more. It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better. Thank you!
Very much agree with your suggestion.
It's likely worth spiking a quick example of what the code changes would look like.
Responding quickly from memory — it's plausible that the current dataset code doesn't expect data variables to be of different types, and the dataset handles the .mean
operation itself rather than delegating it down to each data variable. If that's the case, it might require quite some large-ish changes with lots of if isweighted
, which wouldn't be great.
If it does delegate the .mean
to each data variable (or we could change it to do that), then this could work quite nicely. And might also be generalizable to reduction operations on other arrays, such as sparse arrays...
Does that make sense?
I would like DataArrays that are unweighted to return the usual mean,
I don't think this is a good idea. Consider the case when the weights Dataset is mistakenly missing a couple of data vars. Then you'll unintentionally get unweighted means and not know about it!
You might consider simply adding a scalar 1
in the weights Dataset for any missing data var.
Consider the case when the weights Dataset is mistakenly missing a couple of data vars. Then you'll unintentionally get unweighted means and not know about it!
I'm interpreting this differently — the dataset has some data variables that are weighted and some that are unweighted. There's no ds.weighted(ds_weights)
where a missing data variable in ds_weights
creates an unweighted data variable?
Instead it's db = da.weighted(dw)
, where db
is an array, and that array is assigned to the dataset.
(when I'm confused during a discussion of ours, it's 3 times out of 4 me who's missing something, so asking from the perspective of likely being wrong but hopefully nonetheless helpful)
What happened?
I have a
DataSet
with some weightedDataArrays
. This set-up is extremely useful to me as I can filter and perform operations over the whole dataset and all shared dimensions. One of theDataArrays
is weighted, and I was hoping this would be automatically handled in groupbys and general reduction operations, but the error thrown is below. If I callmean
on the dataset.I'm happy to raise a PR to fix if I can work out how to do it, but I just want to make sure that it's agreed that this isn't correct behaviour.
What did you expect to happen?
I would like
DataArray
s that are unweighted to return the usual mean, and forDataArrayWeighted
to return a mean reflecting their weights, as if I'd just calledda_weighted.mean()
. This would allow me to calculate means in groupbys on theDataSet
.Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
Anything else we need to know?
There are several functions that might fall into this category like
std
, but I think they could all be handled similarly.Environment