Open max-sixty opened 4 months ago
As an aside, the API isn't great but this works in flox (I think)
import flox.xarray
flox.xarray.xarray_reduce(da, "labels1", "labels2", func="mean")
In your example
time
to combined
on the array too. To me, It's not obvious why Xarray should do this automatically, but I'm also quite confused by set_index
on a non-existing dimension. Seems wild.I think you need to change the dimension
time
tocombined
on the array too. To me, It's not obvious why Xarray should do this automatically, but I'm also quite confused byset_index
on a non-existing dimension. Seems wild.
Yes, possibly we should raise an error on this?
Possibly our indexing work means we can now create multiple indexes on a dimension, so we want to be able to .set_index(combined=['labels1', 'labels2'])
even though combined
isn't a dimension. But the rest of the library hasn't caught up, so we get this incoherent groupby behavior?
Regardless, I'm not sure what the (combined)
in * combined (combined) object 48B MultiIndex
means given we don't have a combined
dimension...
I'm not sure what the (combined) in * combined (combined) object 48B MultiIndex means given we don't have a combined dimension
You've created a new variable named combined
with dimension name combined
and assigned an index to combined
. Really what you want is a new variable combined
with dimension name time
. I don't know that there's an ergonomic way of doing that AND stacking at the same time. da.stack
expects to stack dimensions.
You've created a new variable named
combined
with dimension namecombined
and assigned an index tocombined
.
But there's no dimension named combined
on the data array! From above:
But it's not a dimension, the array lists
<xarray.DataArray (time: 6)> Size: 48B
as the dimensions. What's a good mental model for this?
AH now I see. Yes that's a bug.
What is your issue?
I know we've done lots of great work on indexes. But I worry that we've made some basic things confusing. (Or at least I'm confused, very possibly I'm being slow this morning?)
A colleague asked how to group by multiple coords. I told them to
set_index(foo=coord_list).groupby("foo")
. But that seems to work inconsistently.Here it works great:
Then make the two coords into a multiindex along
d
and group byd
— we successfully get a value for each of the three values ond
:But here it doesn't work as expected:
Then we try grouping by
combined
, and we get a value for every value ofcombined
andtime
?I'm guessing the reason is that
combined (combined) object 48B MultiIndex
is treated as a dimension?<xarray.DataArray (time: 6)> Size: 48B
as the dimensions. What's a good mental model for this?reindexed.groupby('combined').mean(...)
orreindexed.groupby('combined').mean('time')
to reduce over thetime
dimension. But that gets even more confusing — we then don't reduce over the groups ofcombined
!To the extent this sort of point is correct and it's not just me misunderstanding: I know we've done some really great work recently on expanding xarray forward. But one of the main reasons I fell in love with xarray 10 years ago was how explicit the API was relative to pandas. And I worry that this sort of thing sets us back a bit.