Closed max-sixty closed 12 months ago
I would be fine with a keep_coords
argument.
I'm wary of always keeping coordinates, because some applied operations could make existing coordinates no longer valid. For example, suppose you want to use pandas's faster time-resampling, i.e., ds.apply(lambda x: x.to_pandas().resample('24H'))
. Any coordinates along the time
would no longer be valid. We could automatically align the coordinates, but that starts to get increasingly magical...
Great @shoyer, agreed
Also attrs
get cleared, which I think should be retained by default?
Is there plans for a 'keep_coords' for Dataset.resample as well?
@snowman2 Possibly yes, though we would want to think through the use-cases for this first. Arguably, you should explicitly preserve coordinates in your custom callable instead.
You could do it in the custom callable, but it requires less expertise and fewer lines of code to add that as an option. The use case I have is land surface model output with x,y coordinates that I would like to preserve.
@snowman2 Can you give a concrete example of the sort of function you would want to apply?
I need input data for a hydrology model in an hourly timestep. So, I use the Dataset.resample
method on data from land surface models to achieve that. Then, I use a custom linear interpolation to fill in the nan
's. I then write out the data to a file. It is easier to write the resampled dataset to the file with the necessary information if the x,y coordinates are not removed in the Dataset.resample
method.
@snowman2 I tried to reproduce your issue, but I couldn't make resample
drop coordinates:
In [21]: ds = xarray.tutorial.load_dataset('rasm')
In [22]: ds.resample('AS', 'time', how=np.sum)
Out[22]:
<xarray.Dataset>
Dimensions: (time: 4, x: 275, y: 205)
Coordinates:
yc (y, x) float64 16.53 16.78 17.02 17.27 17.51 17.76 18.0 18.25 ...
xc (y, x) float64 189.2 189.4 189.6 189.7 189.9 190.1 190.2 190.4 ...
* time (time) datetime64[ns] 1980-01-01 1981-01-01 1982-01-01 1983-01-01
Dimensions without coordinates: x, y
Data variables:
Tair (time, y, x) float64 nan nan nan nan nan nan nan nan nan nan ...
@shoyer, thanks for looking into it. I am resampling from 3hr data to 1hr data.
resampled_ds = ds.resample('1H', dim='time', keep_attrs=True)
I am using it here: https://github.com/CI-WATER/gsshapy/blob/f4e5cb13c1d528021e1953859b712553a4162311/gsshapy/grid/grid_to_gssha.py#L789-L844
I ran into the issue there and had to add code to make sure the coordinates were copied.
Thanks!
@snowman2 can you print an example of what self.data
looks like? And desired vs. actual output if you remove those lines to add in the coordinates manually?
Strange. But I can't seem to re-produce the issue. Maybe it was on a Windows machine or maybe it is fixed now.
In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity
If this issue remains relevant, please comment here or remove the stale
label; otherwise it will be marked as closed automatically
Closing as stale
Generally this isn't a problem, since the coords are carried over by the resulting
DataArray
s:But if there's an operation that removes the coords from the
DataArray
s, the coords are not there on the result (noticec
below). Should theDataset
retain them? Either always or with akeep_coords
argument, similar tokeep_attrs
.