pp-mo / ncdata

Free and efficient exchange of data between netcdf files, Xarray and Iris
https://ncdata.readthedocs.io/en/latest/index.html
BSD 3-Clause "New" or "Revised" License
10 stars 2 forks source link

Stop from-iris saves creating full variable data arrays. #62

Closed pp-mo closed 9 months ago

pp-mo commented 9 months ago

This allows larger-than-memory saves.

Found this while fixing a similar bug in Iris itself : https://github.com/SciTools/iris/issues/5753 The way it was using Dask was making impossible to stream large data (or some sorts) to disk without fetching everything. It seemed that using ncdata to save via xarray instead would be a reasonable workaround.

However, in ncdata, there was another problem : I had believed that np.ones etc created a placeholder object that did not allocate space immediately (as noted in the previous comments on this code). This was plain wrong !

This replaces the all-missing initial values array with a lazy one. That means we can not do a createVariable on a Nc4DatasetLike, and then a partial write to the variable, as this originally envisaged (for compatibility with an actual file variable). However, Iris at least, never actually does that. And in fact, there is no Nc4VariableLike.__setitem__, and never was.

pp-mo commented 9 months ago

Basically just fixes performance.
Could add a peak-memory test, e.g. with tracemalloc (see notes in https://github.com/SciTools/iris/issues/5753) But I don't think it's worth adding that kind of testing just now.