pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.59k stars 1.08k forks source link

expose _to_temp_dataset / _from_temp_dataset as semi-public API? #4837

Open keewis opened 3 years ago

keewis commented 3 years ago

When writing accessors which behave the same for both Dataset and DataArray, it would be incredibly useful to be able to use DataArray._to_temp_dataset / DataArray._from_temp_dataset to deduplicate code. Is it safe to use those in external packages (like pint-xarray)?

Otherwise I guess it would be possible to use

name = da.name if da.name is None else "__temp"
temp_ds = da.to_dataset(name=name)
new_da = temp_ds[name]
if da.name is None:
    new_da = new_da.rename(da.name)
assert_identical(da, new_da)

but that seems less efficient.

dcherian commented 3 years ago

Related: There's some weirdness about _to_temp_dataset not preserving name. I had to work around that for map_blocks

https://github.com/pydata/xarray/blob/bc35548d96caaec225be9a26afbbaa94069c9494/xarray/core/parallel.py#L74-L94

keewis commented 3 years ago

that's true. I guess for that there's the name parameter to _from_temp_dataset, which of course can't be used if _to_temp_dataset and _from_temp_dataset are called in different functions.

max-sixty commented 3 years ago

That seems very reasonable.

To the extent the goal is "apply a function that takes a dataset to this dataarray", we could make a function that does exactly that and use _to_temp_dataset within that. Does that make sense?

keewis commented 3 years ago

To the extent the goal is "apply a function that takes a dataset to this dataarray", we could make a function that does exactly that and use _to_temp_dataset within that.

yes, I think that would work, too.

shoyer commented 3 years ago

Related: There's some weirdness about _to_temp_dataset not preserving name.

It used to preserve name, but now name is always set to the _THIS_ARRAY object. This is for two reasons:

  1. It isn't always possible to convert a DataArray into a Dataset preserving name, if name is also found on a coordinate.
  2. To ensure that xarray functions always work, even in this case, it was more reliable to always replace the name. If we only replaced name in case (1), then it's likely that some functions wouldn't handle names of that form and would produce an error only in that case.

To the extent the goal is "apply a function that takes a dataset to this dataarray", we could make a function that does exactly that and use _to_temp_dataset within that. Does that make sense?

This sounds like a fine idea to me. It's kind of the opposite of Dataset.map.