Open zerothi opened 1 week ago
What would you suggest as a change?
Good question, it probably involves much more elaboration.
Returns a new DataArray with renamed coordinates, dimensions or a new name. The new DataArray will share memory location with the old DataArray:
np.shares_memory(new.values, old.values) is True
.
I don't know how to do this succinctly without ambiguity. ;)
It was just a puzzle to me since I didn't know what to expect from doing the operation. And since there is no declaration of the intended use, I am not sure I can trust xarray
to do the same in newer releases. Since I would depend on it not making a copy, I just wanted to know what xarray intends to do, and if it wants to keep the current behavior, I would like it emphasized.
I guess the same question/clarification could be said about https://docs.xarray.dev/en/latest/generated/xarray.Dataset.rename.html and friends.
I can make a PR, in case there is a common ground as to what it should say... :)
I think the distinction here is between a new DataArray and a copy of the underlying data.
To me that's less confusing but possibly that's because I'm too deep in it.
One issue is that a substantial number of methods do this, and we don't want to have the same description on every method...
I see, if I read new DataArray I would implicitly assume there is no connection between the new and the old object. Where does the boundary of new
lie? Dimension names? Coords and values, data values, attributes, etc? It isn't clear (at least to me) what new
encompasses.
I agree about the number of methods that does this would mean a lot of changes, and digging in the code. I could, of course, also be wrong here. But I would think that most users don't know about this detail.
I.e. this might suprise some users:
import xarray as xr
import numpy as np
data = np.random.rand(3, 3)
print(data)
a = xr.DataArray(
data,
dims=list("xy"),
)
print(a)
b = a.rename({"x": "z"})
print(b)
b[0, 1] = 2
print(a)
One change could just be:
- Returns a new DataArray with renamed coordinates, dimensions or a new name.
+ Returns a DataArray with renamed coordinates, dimensions or a new name.
...which is possibly less confusing while being simpler. Then if folks want to know about when xarray copies the underlying data, they can look at .copy
and learn more there.
Would accept this as a PR unless there are any objections
I don't see the connection to copy
, if anything it should be:
- Returns a new DataArray with renamed coordinates, dimensions or a new name.
+ Returns a shallow copied(?) DataArray with renamed coordinates, dimensions or a new name.
Is copied
necessary here?
for absolute clarity. With your proposal you still don't know if it is a shallow or deep copy. And reading copy does not give any more information.
OK, I had thought that this might shed some light on the difference between shallow & deep copies: https://github.com/pydata/xarray/blob/a74a60508edd844ef425637f27598f1d8c5385a2/xarray/core/dataarray.py#L1230-L1237. On reflection I agree it's a bit tangential.
Personally I think that "Returns a shallow copied DataArray" is too many words + not a great description of what's going on — it's a new object rather than a shallow copy. But if others believe it to be better, very happy to take the majority view.
I would support removing the new
given it reduces the word count + I think removes some minor confusion...
And if we don't already have a section on how most method work — i.e. they return a new object with the original data as a property — then that would be a great contribution.
I agree it isn't easy. Hard to clarify without lots of words. Perhaps for the sake of changing something we could stick with removing new
, then if this pops up later, we at least have a reference here.
I just think it would be simpler if the documentation clarifies the intention and behaviour, since currently I do tests to see whats happening, but I don't know if these things changes down the line.
The same thing happened to scipy objects (sparse matrices) where __i<op>__
methods didn't actually behave like one would think. I know it is hard, but clarity is nice! ;)
Yes totally!
For xarray, the mental model is that all methods:
I think that mental model gets us to a simple correct answer 98% of the time (the only remaining ambiguities are a couple of small cases on whether a copy is required)
What is your issue?
From the documentation: https://docs.xarray.dev/en/stable/generated/xarray.DataArray.rename.html#xarray.DataArray.rename it isn't clear if the contained data is copied (new memory), or whether it is retained.
It says:
which may indicate that it copies, however it doesn't:
Now:
inplace
arguments allowing this.