C++ extensions of driver to read virtual Zarr datasets described in JSON metadata format. Optimize s3 reading performance and mult-dimensional reconstruction of array datasets.
See issue here: https://github.com/Unidata/netcdf-c/issues/2777
main.cpp
-> json_parse.h
-> kerchunk_read.h
-> prin_helpers.h
-> mult_dim_form.h
-> Finishmain.cpp
: main entry point for program; key calls include: json_parse()
and kerchunk_read()
config.h
: inputs and settings for program
HARDCODED_CHUNK_INDEX
, HARDCODED_JSON_PATH
custom_structs.h
: hold custom structs shared across program (excluding layer_t
)json_parse.h
: given json path, parse out metdata relevant for all chunks and for index specific chunksjson.hpp
: nholmann json processing libraryiter_chunk.h
: coordinate metadata extraction and runs for multiple chunk indexeskerchunk_read.h
: given JSON metadata, read s3 stream and perform decompression, shuffling, etc until original array obtained. Calls on mult_dim_form.h
to regain full dimensionsmult_dim_form.h
: given flat array, reconstruct the full dimensions as originally stored (bytes read as single flat dimensions from s3 stream)print_helpers.h
: debug printer functions, controlled by the constant DEBUG_PRINT_ON
in config.h
make_kerchunk_refs.ipynb
: ipynb to generate JSON metadata from select s3 objectrange_req_dynamic.ipynb
: python edition of kerchunk process; use as verification and testing of c++ addition. Includes s3 byte stream, zlib decompression, unshuffle, dtype processing, and xarray comparison