Open kadykov opened 11 months ago
What do you think the right behaviour should be? Catalog entries are special in Intake (<2.0) in that they get their subentries eagerly, so they have access to the file metadata immediately, is this what you are getting at?
I expected that cat.netcdf.metadata
includes also the metadata from the file like this: {'catalog_metadata': 'The metadata in the catalog entry', 'xarray_metadata': 'The metadata in the xarray file'}
.
But now, the xarray_metadata
key appears only after reading the whole file by executing cat.netcdf.read()
.
I think it would be better to have "lazy" metadata reading from files because there also could be some useful information... What do you think?
The .discover()
method is meant exactly for this purpose, to get information from the file with a minimum of reads. It's usefulness varies by file type.
Actually, xarray is lazy by default, so even if you do a .read()
, you do no load all the data into memory, only enough for xarray to be able to understand the file's layout (typically the attributes and coordinate arrays).
The entries powered by
intake_xarray
driver does not lazy read metadata from the files.As you see from the output, the metadata from the entry powered by
intake
driver has the field from thezarr
file:However, after reading the files, the metadata is complete:
Output:
OS: Windows 10 python 3.11.5 intake 0.7.0 intake_xarray 0.7.0 xarray 2023.8.0 zarr 2.16.1