Open haakon-e opened 1 week ago
Not much we can do with a broadcast. Maybe we need a dedicated combine
function for this.
combine(mean, groupby(A, Ti => year))
We can store the time dim in DimGroupByArray so we know to pass it to dims
in mean
?
Or maybe combine
accepts keywords that get passed to the function
Not much we can do with a broadcast. Maybe we need a dedicated
combine
function for this.
I don't quite understand this. Why then does mean.(groupby(A, Ti => year))
return a DimArray
instead of DimGroupByArray
?
Alternatively, if we define mean(B::DimGroupByArray)
directly, I should be able to do what I expect.
In fact, currently doing mean(B)
somewhat confusingly (as I see it) averages across groups. Perhaps that's worth a separate issue:
Defining something as naive as: (which is essentially what it appears you propose, except I'm skipping combine
)
function Statistics.mean(B::DimensionalData.DimGroupByArray)
groupby_dim = name(dims(B))
reduced_B = mean.(B, dims=groupby_dim)
cat(reduced_B..., dims=groupby_dim)
end
appears to work (and is essentially what I'm doing already), but a look at it with @code_warntype
appears it does not yield type-stable code, because I just do mean.(B, dims=groupby_dim)
under the hood.
so... perhaps something like combine
and some clever use of the stored type information, as you suggest, is indeed a reasonable way forward?
I was trying to check how e.g. xarray does this, but they seem to essentially directly support mean(grouped_array)
.
The problem is it's pretty ambiguous what mean
means here. XArray hacks mean to do an inner mean and cat like you want. But that's not very Julian.
If DimGroupbyArray is actually an AbstractArray we can't break the interface - it's an array of arrays and should act like any other array of arrays
I guess you are suggesting it should not be an AbstractArray ? (To not get that confusing result from mean
)
And yes combine
is specifically my idea to do what XArray does but without the magic inner mean behaviour from the regular mean
method. This will also mean it works for any function, which is not true in xarray
Alright, I see it's not julian to apply mean(B, dims=Ti)
directly with the output I am expecting, so combine
seems to be a good idea. Do you know if the use of combine
is found in other similar packages/languages? It's probably a good idea to use terminology/approaches people find intuitive already.
I'm not sufficiently steeped in this code base to have a well-informed opinion whether DimGroupByArray
should be an AbstractArray
or not. That said, I don't imagine the use of the current result from mean(B::DimGroupByArray)
, so perhaps a dispatch could be added which errors? I'm not sure if changing the underlying abstract type could have other unintended consequences..
Suppose I define
Then I want to groupby and average over years. If I just do "vanilla" mean, I get a
DimArray
back:However, this also averages over
X
. So instead, I specifymean(A, dims=Ti)
This keeps the output as a
DimGroupByArray
. I feel I should get aDimArray
back here?The extra step I need to do to get a DimArray is
cat
:I wonder if this last step could be baked into the reduction step? If not, perhaps we could think of some way of documenting how to get back a
DimArray
from aDimGroupByArray
(e.g. usingcat
).