pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.63k stars 1.09k forks source link

Append along an unlimited dimension to an existing netCDF file #1672

Open shoyer opened 7 years ago

shoyer commented 7 years ago

This would be a nice feature to have for some use cases, e.g., for writing simulation time-steps: https://stackoverflow.com/questions/46951981/create-and-write-xarray-dataarray-to-netcdf-in-chunks

It should be relatively straightforward to add, too, building on support for writing files with unlimited dimensions. User facing API would probably be a new keyword argument to to_netcdf(), e.g., extend='time' to indicate the extended dimension.

Hoeze commented 6 years ago

Any updates on this?

jhamman commented 6 years ago

None that I'm aware of. I think this issue is still in the "help wanted" stage.

mullenkamp commented 5 years ago

I would love to have this capability. As @shoyer mentioned, for adding time steps of any sort to existing netcdf files would be really beneficial. The only real alternative is to save a netcdf file for each additional time step...even if there are tons of time steps and each file is a couple hundred KBs (which is my situation with NASA data).

I'll look into it if I get some time...

thomas-fred commented 5 years ago

This would be extremely helpful for our modelling of time varying renewable energy.

hmaarrfk commented 4 years ago

I think I got a basic prototype working.

That said, I think a real challenge lies in supporting the numerous backends and lazy arrays.

For example, I was only able to add data in peculiar fashions using the netcdf4 library which may trigger complex computations many times.

Is this a use case that we must optimize for now?

hmaarrfk commented 4 years ago

Small prototype, but maybe it can help boost the development.

```python import netCDF4 def _expand_variable(nc_variable, data, expanding_dim, nc_shape, added_size): # For time deltas, we must ensure that we use the same encoding as # what was previously stored. # We likely need to do this as well for variables that had custom # econdings too if hasattr(nc_variable, 'calendar'): data.encoding = { 'units': nc_variable.units, 'calendar': nc_variable.calendar, } data_encoded = xr.conventions.encode_cf_variable(data) # , name=name) left_slices = data.dims.index(expanding_dim) right_slices = data.ndim - left_slices - 1 nc_slice = (slice(None),) * left_slices + (slice(nc_shape, nc_shape + added_size),) + (slice(None),) * (right_slices) nc_variable[nc_slice] = data_encoded.data def append_to_netcdf(filename, ds_to_append, unlimited_dims): if isinstance(unlimited_dims, str): unlimited_dims = [unlimited_dims] if len(unlimited_dims) != 1: # TODO: change this so it can support multiple expanding dims raise ValueError( "We only support one unlimited dim for now, " f"got {len(unlimited_dims)}.") unlimited_dims = list(set(unlimited_dims)) expanding_dim = unlimited_dims[0] with netCDF4.Dataset(filename, mode='a') as nc: nc_dims = set(nc.dimensions.keys()) nc_coord = nc[expanding_dim] nc_shape = len(nc_coord) added_size = len(ds_to_append[expanding_dim]) variables, attrs = xr.conventions.encode_dataset_coordinates(ds_to_append) for name, data in variables.items(): if expanding_dim not in data.dims: # Nothing to do, data assumed to the identical continue nc_variable = nc[name] _expand_variable(nc_variable, data, expanding_dim, nc_shape, added_size) from xarray.tests.test_dataset import create_append_test_data from xarray.testing import assert_equal ds, ds_to_append, ds_with_new_var = create_append_test_data() filename = 'test_dataset.nc' ds.to_netcdf(filename, mode='w', unlimited_dims=['time']) append_to_netcdf('test_dataset.nc', ds_to_append, unlimited_dims='time') loaded = xr.load_dataset('test_dataset.nc') assert_equal(xr.concat([ds, ds_to_append], dim="time"), loaded) ```
espiritocz commented 3 years ago

hi - i consider this extremely useful!!!

is your prototype already part of some library (or should we expect it in xr?)

many thanks for the code

hmaarrfk commented 3 years ago

It isn't really part of any library. I don't really have plans of making it into a public library. I think the discussion is really around the xarray API, and what functions to implement at first.

Then somebody can take the code and integrate it into the decided upon API.

ChrisBarker-NOAA commented 4 weeks ago

Any movement on this? I'd love to have this -- kinda critical for some of my work.

@hmaarrfk seems to have made a start, and it doesn't look too hairy :-)