Closed dougiesquire closed 2 weeks ago
👋🏽 @dougiesquire,
the primary reason why we are using ds.data_vars
instead of ds.variables
is that we don't want xarray to perform unnecessary and expensive checks on the coordinates when merging or concatenating datasets. In the past, merging and concatenating model data produced on different machines often failed due to roundoff errors in coordinates, which was a considerable issue. by using data_vars()
, we were able to bypass these unnecessary coordinate comparisons (e.g. when concatenating two variables from same model but with compatible grids) in xarray. however, I'm not sure if this is still necessary.
So, I'm in favor of trying to relax this requirement to see what happens. another option would be to introduce a configurable setting that allows users to turn on the use of ds.variables
in place of ds.data_vars
. this latter approach is likely better if we aim to ensure backward compatibility.
Hello and thanks @andersy005! I could well be misunderstanding, but I think using ds.variables
would only change things for users who have included coordinate variables in their catalog (and presumably want to be able to open them). I don’t understand why there would be any performance implications: variables
is constructed only from variables included in the catalog, so I don’t think any additional coordinates should be loaded unless they are explicitly asked for from the catalog. FWIW, changing to ds.variables
doesn’t appear to make the tests run any slower.
If you agree, I advocate that we don’t add the additional complexity of making this change configurable.
@dougiesquire - checking back in here, are you interested in making a pull request with the fix you mentioned?
Description
I'd like to include coordinate variables in my Intake-ESM datastore so that they can be searched for and opened like any other variable. However, Intake-ESM currently only allows opening the xarray
data_vars
from an asset. If I try to open a coordinate variable from my datastore I am returned an empty xarray Dataset.The obvious fix is to edit the line in the link above to include any
variable in ds
, but there may be a reason I don't understand for why things are done how they are currently.What I Did
To reproduce requires a multi-variable catalog that includes a coordinate variable in the lists of variables. E.g. one can edit the first row of
tests/sample-catalogs/multi-variable-catalog.csv
to include"TLAT"
in the list of variables. Then:returns a Dataset that does not contain
"TLAT"
Version information: output of
intake_esm.show_versions()