Add `columns_with_iterables` parameter to `esm_datastore`

intake / intake-esm

An intake plugin for parsing an Earth System Model (ESM) catalog and loading assets into xarray datasets.

https://intake-esm.readthedocs.io

Apache License 2.0

138 stars 47 forks source link

Add `columns_with_iterables` parameter to `esm_datastore` #589

Closed dougiesquire closed 1 year ago

dougiesquire commented 1 year ago

This PR adds an optional columns_with_iterables parameter to the intake_esm.esm_datastore API that specifies the columns to convert with ast.literal_eval. This enables intake yaml descriptions of intake-esm catalogs with multi-variable assets that work by default. See #587 for context/motivation.

Note, I also explored using intake dataset transforms, but I don't think this provides quite the functionality we need.

Fixes #587

Checklist

[x] Unit tests for the changes exist
[x] Tests pass on CI
[x] Documentation reflects the changes where applicable

mgrover1 commented 1 year ago

Also a test here would be helpful

dougiesquire commented 1 year ago

Thanks @mgrover1. Tests and docs added. Note, I had to pin netcdf4<1.6.0 to get the tests to pass and docs to build. This is due to a change in 1.6.1 that seems to be causing issues all over the place, see e.g. https://github.com/Unidata/netcdf4-python/issues/1192

andersy005 commented 1 year ago

thank you for this addition, @dougiesquire & @mgrover1!

dcherian commented 1 year ago

Should this info be stored in the JSON instead? It's a property of the catalog, so the information can be specified at write-time instead of at read-time.

andersy005 commented 1 year ago

Should this info be stored in the JSON instead? It's a property of the catalog, so the information can be specified at write-time instead of at read-time.

i'm definitely in favor of supporting this when it's defined in the JSON and adding it to the spec: https://github.com/intake/intake-esm/blob/main/docs/source/reference/esm-catalog-spec.md

mgrover1 commented 1 year ago

Should this info be stored in the JSON instead? It's a property of the catalog, so the information can be specified at write-time instead of at read-time.

i'm definitely in favor of supporting this when it's defined in the JSON and adding it to the spec: https://github.com/intake/intake-esm/blob/main/docs/source/reference/esm-catalog-spec.md

Agreed! I think allowing both here makes sense.