Open derhintze opened 6 days ago
Can confirm that the output is the same with xarray 2024.6.0
I believe this may be intentional (I may be wrong, though): it is often not so useful to reduce the coordinates with the same operation as the data, and so xarray
drops them instead.
If you really need this, you can convert them to data variables first using .reset_coords(names)
, do the reduction, then use .set_coords(names)
.
@keewis Thanks! I'm not sure if it's "often" not so useful, tho ;) Can't come up with a reasonable example from our field (2D sensor data processing), but I get the point. I did what you suggest as a work-around, but I had hoped for a better solution. A bit tedious. The thing is, coarsen
indeed does mean
coords by default. So also some contraption like
data.coarsen({"dim0": data.sizes["dim0"]}).mean(dim="dim0").squeeze()
would work. But reading this, imho, suggests that data.mean(dim="dim0")
should do the same.. but well, that's subjective ;)
This is indeed intentional — the role of coordinates is to have things which aren't computed along. That's particularly the case when doing something like .lag
— we don't want the coords lagging — but also the case with a reduction.
Are there times which xarray is inconsistent there? Is there an example of where something "should" be a coordinate but should also be reduced over?
Maybe we could add an option to the reductions that allows to change this behavior?
Something like data.mean(dim="dim0", coords="mean")
with a default value of "drop"
.
But the workaround could be sufficient here.
@max-sixty
Are there times which xarray is inconsistent there?
Well, if you consider the behaviour I described above considering coarsen
consistent with not reducing over coordinates, where coarsen
does reduce over coords, then, no, not that I'm aware of. To be fair, though, it's documented that coarsen
does average coords by default.
Is there an example of where something "should" be a coordinate but should also be reduced over?
That's a hard question, since it would depend on conventions of what people put into coords. We have time-series of 2D sensor images as data variables, where we want to do operations with, and then add coordinates containing metadata, like temperatures, time stamps, measurement-specific inputs like light-source wave-length or power. In all of those cases, when averaging over the time-series of 2D sensor data, we'd like to average the coordinates, too.
Granted, given there are work-arounds, and we can implement our own wrapping for this sort of stuff, it's not a big deal.
Yes, very reasonable @derhintze !
Good point around coarsen
. I do think that's somewhat specific to coarsen
, where it's applying a transformation to coords / labels. I agree it makes the separation a bit fuzzier.
I would vote to retain the behavior around coords — data.mean(dim="dim0", coords="mean")
seems not much simpler than moving coords to vars and introduces more surface area to the API...
What happened?
Averaging the data variables along some dimension drops coordinates that also have that dimension.
What did you expect to happen?
I would expect that the coordinates aren't dropped, but averaged along said dimension, too.
Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
Anything else we need to know?
I had a look at #1470 and #3510, but those appear unrelated?
Environment