sdtaylor / phenology_forecasts

The backend for http://phenology.naturecast.org
http://phenology.naturecast.org
8 stars 1 forks source link

PrettyBigData Issues #4

Open sdtaylor opened 6 years ago

sdtaylor commented 6 years ago

On running into bottlenecks dealing with 100's of GB (several TB's in the future) of weather forecast data.

Using chunks in NetCDF https://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters

Xarray discussion on aligning dask and netcdf chunks https://github.com/pydata/xarray/issues/1440

example of using apply_ufunc in downscaling observed and modelled arrays https://groups.google.com/forum/#!topic/xarray/eyWr_ajTmL4

sdtaylor commented 6 years ago

on saving the downscale_model coefficients file

downscale_model.to_netcdf('downscale_compress_test.nc', encoding={'slope':{'zlib':True,'complevel':9}, 'intercept':{'zlib':True,'complevel':9}})

72MB

downscale_model.to_netcdf('downscale_compress_test.nc', 
encoding={'slope':{'zlib':True,'complevel':9, 'dtype':'int16', 'scale_factor':0.0001,  '_FillValue': -99999}, 
          'intercept':{'zlib':True,'complevel':9, 'dtype':'int16', 'scale_factor':0.0001,  '_FillValue': -99999}})

18MB, but potentially has some loss issues. mean intercept/slope does not equal before and after saving. saving as int32 instead fixes that.

final one

downscale_model.to_netcdf('downscale_compress_test.nc', 
encoding={'slope':{'zlib':True,'complevel':9, 'dtype':'int32', 'scale_factor':0.00001,  '_FillValue': -99999}, 
          'intercept':{'zlib':True,'complevel':9, 'dtype':'int32', 'scale_factor':0.00001,  '_FillValue': -99999}})

23MB