xpublish-community / xpublish-edr

OGC EDR router for xpublish
BSD 3-Clause "New" or "Revised" License
3 stars 3 forks source link

Supporting a flexible dataset schema #37

Open jhamman opened 3 months ago

jhamman commented 3 months ago

Xpublish-EDR currently support a very narrow dataset schema. For example, coordinates must be named X, Y, and T corresponding to longitude, latitude, and time. When datasets do not conform to this, there is a somewhat confusing error raised about CF conventions:

https://github.com/xpublish-community/xpublish-edr/blob/019e53acd2e0ad5a1d909d1acfe9863f2e90e51b/xpublish_edr/plugin.py#L79-L85

Of course, there are many ways a dataset can be CF complaint (with or without these coordinates present). My understanding of the EDR spec is that it is agnostic to the names of these coordinates in the underlying data source. So Xpublish-EDR should be able to find a way to bridge between coordinate names provided they are real CF.

Can we work on making Xpublish-EDR more flexible here? CF-Xarray could offer a shortcut here.

mpiannucci commented 3 months ago

@abkfenris and I have talked about this before, because a lot of the (not super thought out) abstractions for xpublish-wms overlap a ton with EDR functionality, and same with our newer subsetting prototype plugin.

Both of these are built on top of cf-xarray, which as you mentioned is probably the most important piece here.

abkfenris commented 3 months ago

Xpublish-EDR should be using CF-Xarray's coordinate criteria to figure out which coordinates correspond to X, Y, and T, and not require datasets to have coordinates be named exactly that.

Here's the attributes on one of my datasets that is playing nice with Xpublish-EDR/CF-Xarray (though my Xpublish server is currently being cranky, so I gotta go figure that out):

image
jhamman commented 3 months ago

Thanks both for your input here. I'm realizing now that I jumped to a conclusion about CF (and cf-xarray) that may not have been correct. I'll do a bit more digging and return with a more concrete request.

abkfenris commented 3 months ago

I think there is probably a place for configuration on both the plugin and dataset sides.

We could enable setting up the ERD plugin with some custom coordinate criteria so folks could enable matching on patterns that are common in their datasets.

On the dataset side, it would probably make sense to have a standard attribute pattern that can prompt plugins but doesn't get sent to users by plugins that would otherwise share attributes.

ds['funky_time'].attrs['_xpublish']['edr']['axis'] = 'T'

jhamman commented 3 months ago

After looking at this in more detail, I understand the problem -- our dataset is simply missing the axis (or similar) attributes for X, Y, and T.

The feature request I would offer here then is to have some sort of verification tool that runs when registering a dataset with a router. If an error or warning had been issued at startup, this would have been much easier to debug.

abkfenris commented 3 months ago

Hmm, I don't think we'd want to make it the default, since with dataset providers Xpublish doesn't necessarily know of all datasets at startup. We also don't have a hook currently to call plugins on startup, but after datasets may be available.

How about an endpoint to check if the dataset has the right attributes?

There is an existing issue in CF-Xarray for some sort of checker https://github.com/xarray-contrib/cf-xarray/issues/366 but I don't think we need to be blocked by that. I think we can do a simpler query to see if CF-Xarray has identified the axes.