Closed JackKelly closed 2 years ago
@jacobbieker and @peterdudfield what do you guys think about keeping the script to convert EUMETSAT native
files to an intermediate file format in satip
(instead of in nowcasting_dataset
)? I don't have any strong feelings. A few advantages of keeping this script in satip
:
nowcasting_dataset
doesn't have to be dependent on satip
or satpy
.nat
files to an easier-to-use intermediate format. These users might have no interest in nowcasting_dataset
.Yeah, I agree, keeping as much of the satellite specific stuff in Satip makes sense. I like generally keeping the packages as separate as possible.
Features of this script
.nat
files; and target directory for the Zarr..nat
data: When the script starts, it checks through all the.nat
files (recursively), and checks through the existing Zarr, and only converts data which is present in the.nat
files but absent in the Zarr. I think you can append to Zarr stores using something likexr.Dataset.to_zarr(mode='a', append_dim='time')
. Definitely have a look at the xarray docs on appending to Zarr. It's possible that appending to Zarr only works correctly if data is appending in order, but I'm not certain! (Zarr's fragility when it comes to appending data might be one strong argument for swapping to using GeoTIFF or individual NetCDF files per EUMETSAT timestep, instead of Zarr... But let's try to get Zarr to work because it does seem to enable the fastest reads).int16
, using only 10 bits per pixel per channel. i.e., re-scale each channel to [0, 1023], and save innp.int16
dtype. This results in really good compression (better than usingfloat16
), and probably more precise (see the raw benchmark results here. I benchmarked a bunch of compression algorithms.compressor = numcodecs.Blosc(cname="zstd", clevel=5)
was the best setting I found. If we want to be really ambitious we could try compressing with a lossless, modern image compression algorithm like AVIF or WebP. Some more notes about these options in #13. But, for now,zstd
is probably fine.)Related:
.nat
files to NetCDF13