The idea is to speed up opening up netcdf4/hdf5 datasets with a NASA specific optimization. Load data like xr.open_mfdataset with kerchunk/zarr speeds by translating existing dmr++ metadata files to zarr metadata on the fly. Much more context and discussion here.
The current bottleneck is creating the xml object (dmrpp are xml files) with ET.fromstring(dmr_str) since the xml.ElementTree library needs to read the text, validate the xml, and create a parsable object. I am looking into a non-validating parser like xml.parsers.expat
The idea is to speed up opening up netcdf4/hdf5 datasets with a NASA specific optimization. Load data like
xr.open_mfdataset
with kerchunk/zarr speeds by translating existing dmr++ metadata files to zarr metadata on the fly. Much more context and discussion here.virtualizarr
PR for the parser hereearthaccess
PR hereearthaccess
additions:xarray
s concatenation logic to create virtual views of netcdf’s (more details invirtualizarr
documentation)zarr
engine inxarray
to load a dataset (with indexes)Questions/Suggestions:
Changes to the API?
NASA datasets you want me to test?
MUR-JPL-L4-GLOB-v4.1
(netcdf),SWOT_SSH_2.0
(netcdf),ATL-03
ICE-SAT (hdf5)Take a look at the virtualizarr parser PR and leave suggestions
ET.fromstring(dmr_str)
since thexml.ElementTree
library needs to read the text, validate the xml, and create a parsable object. I am looking into a non-validating parser likexml.parsers.expat