Open rabernat opened 2 years ago
Thanks for opening an issue, @rabernat ! Does seem like an exciting proposal and a great way to keep maintaining oceanspy in the long run. Will get to this after ocean sciences. Till then, I'll wait to hear what @malmans2 and @Mikejmnez have to say.
Thanks for the enthusiasm @asiddi24! It's great that you see this as exciting. I see it as more of an unglamorous backend maintenance task that oceanspy users may not even notice...but will ultimately lead to better performance and maintainability.
I think they're all good suggestions! OceanSpy definitely needs maintenance/refactoring. I've another couple of suggestions:
xoak
will get superseded by xarray refactoring. I'm not up to speed with the ongoing xarray refactor, but xoak
has been working great for me so far.The xoak creator (@benbovy) is also the one leading the xarray index refactor. So I imagine they will converge in some way. Maybe xoak will just provide the index objects themselves, which xarray can then use? Benoit, we would love to hear about your plans for xoak (and get your general feedback on this issue).
Thanks for pinging me @rabernat.
Maybe xoak will just provide the index objects themselves, which xarray can then use?
Yes that's the plan with Xoak. I think it will still be useful to provide Dataset / DataArray accessors, for example to expose Xarray-compatible low-level API like an .xoak.query()
method to get the indices and distances of/to the nearest neighbors.
While I've not looked much into Oceanspy and this a bit outside of my domain of expertise, the subsample functions seem good uses cases for experimenting with Xarray custom indexes, which at this stage would also be really helpful for the Xarray index refactor itself as I'm sure there's still much room for improvement!
Consider separating these indexes into a standalone package which provides Xarray entrypoints, such that the indexes can be used independently from oceanspy
Make use of xoak to perform nearest neighbor interpolations and extract stations/moorings/floats.
Those are sensible points. I think that in the mid/long term it will be better for the ecosystem if we can avoid a jungle of Xarray indexes with lots of overlapping features.
In https://github.com/pydata/xarray/pull/5692 we require that matching indexes for alignment (merge, etc.) must have exactly the same type, which limits interoperability between indexes but makes the implementation much simpler. We might eventually support some kind of "duck" indexes, but it's a considerably harder problem.
Current Status
An important part of oceanspy's feature space is concerned with providing custom ways to select data from Xarray datasets. These are enumerated on the API docs. The relevant functions are
Oceanspy needed to implement these functions because xarray's built in indexers (e.g.
.sel()
or.interp()
) were not capable of performing the required operations in the case of curvilinear grids.Ongoing Xarray Refactor
Supported by a CZI grant, the Xarray team has been hard at work on the so-called Flexible Indexes Refactor
Additional information about the refactor can be found at:
Once the PR 5692 is merged, this feature should be useable for development purposes.
Proposal: Refactor Oceanspy subsample function to be custom Xarray indexes
The whole point of this refactor ("allow third-party libraries to implement custom indexing routines") is to enable projects like OceanSpy to bring their own concepts of indexing directly to xarray datasets. So I thought I would propose we do exactly that. The steps would look something like this.
Pros
Cons