openclimatefix / MetOfficeDataHub

Python wrapper around MetOffice Atmospheric Model Data REST API
MIT License
3 stars 0 forks source link

Save as netcdf #12

Closed peterdudfield closed 2 years ago

peterdudfield commented 2 years ago

Detailed Description

Save joined files at netcdf (instead of zarr). Think this is slightly better as zarr file is a lots of small files.

JackKelly commented 2 years ago

Think this is slightly better as zarr file is a lots of small files.

Just to clarify:

NetCDF is probably better when you know you'll always want to load the entire dataset into memory (which is probably true when we only have a few hours of NWP data. But probably isn't true when we have days (or more) of NWP data).

Zarr is better when you might be on the cloud, and when you often want to load only part of the dataset into memory. (Because, on the cloud, you can't really seek into objects in cloud storage. So if you use a single large NetCDF file then you have to load the entire NetCDF file from the cloud bucket, even if you only want a small subset of the data).