Open DanRyanIrish opened 5 years ago
We had a very productive discussion with Martin Durant @martindurant and Jim Bednar at the sunpy coordination meeting today. I think that as far as WCS / xarray integration goals go from an ndcube perspective this issue is probably the best place to start.
As I mentioned in the discussion today I am very happy to bring the WCS expertise to any effort to build out xarray support for WCSes. I suggest we use this issue as a good place to coordinate this work and discussion with other interested parties.
Also relevant to this is @eteq's issue https://github.com/pydata/xarray/issues/3620
cc @jbednar , who will get in touch when his team is ready
Hi all! I am the third member of the @jbednar and @martindurant trio. Sorry I couldn't attend the discussion a couple of weeks ago, but I have watched the video of the conversation.
I am going to be at the NumFOCUS Project Summit in 2 weeks and hope to have a face-to-face chat with @wtbarnes. Hopefully we can involve the xarray
folks too. I am happy to chat remotely before that if it helps!
hey @ianthomas23 :wave:
Unfortunately I wont be at the NumFOCUS summit, but I am happy to chat either before or afterwards. I am super keen to see how far we can get with WCS + xarray and how to integrate it into ndcube.
Do you have a plan for what you are going to be working on yet?
Do you have a plan for what you are going to be working on yet?
I think we have to have a pragmatic 2-stage approach. Firstly write some experimental code using WCS (astropy.wcs
?) that is reasonably correct and reasonably performant so that we can progress with the rest of our project. This gives us a bit more time and space for stage 2 which is to find the correct long-term solution. Exactly what stage 2 looks like has yet to be determined, as does where the solution lies and who works on it!
We are hoping to talk to some xarray
devs next week sometime to get their latest updates. Maybe we should have a chat after that?
We are hoping to talk to some xarray devs next week sometime to get their latest updates. Maybe we should have a chat after that?
I'd be interested to join the discussion. We've made progress on Xarray custom indexes recently and hopefully it should be ready for "public" use (although still very experimental) with the next release. I'm happy to present about the latest updates on this front. I'd also like to allocate some time on working on or helping domain experts experimenting with custom indexes. From what I understand in https://github.com/pydata/xarray/issues/3620, WCS + xarray looks like a nice fit for testing this feature!
Hi @ianthomas23. Great to hear you'll be at the NumFocus Summit next week. Though I'll be the only SunPy maintainer present, I believe there will also be several astropy maintainers there in person as well who may be interested in this discussion or who can at least provide some WCS expertise.
Following the in-person meeting last week at the NumFOCUS Project Summit, I will progress with some standalone code to go from FITS file to Xarray using flexible indices to generate WCS coordinates using the AstroPy WCS module. This will probably only support the subset of WCS that our current project requires. Other meeting attendees to provide guidance: @eteq (AstroPy), @wtbarnes and @Cadair (SunPy), @andersy005 (Xarray).
The subsequent longer-term goal is that this code will become part of wider ndcube/Xarray interoperability.
Here is some progress to report: https://nbviewer.org/gist/ianthomas23/bbf85d1a38f8f161a2c1f641d9465096.
This is a Jupyter Notebook that uses an xarray.Accessor
to convert from pixel to world coordinates using astropy
transforms. It follows the approach of https://github.com/pydata/xarray/issues/3620#issuecomment-855710036 in using a pixel_to_world
accessor function that returns a new xarray.Dataset
with WCS coords. The next comment in that issue describes an alternative approach that combines both pixel and world coordinates in the same Dataset
.
This is pretty much everything that I need in the short term. It is quite specific to my test data and will need generalising to other use cases. I have played around a bit with a functional index (xarray.core.indexes.Index
) a little as well, to allow e.g. slicing of pixel indexes based on WCS values, but I don't have anything concrete from that as I don't think I need it.
Probably not strictly relevant to target audience here, but if you download and run the notebook locally you can interactively zoom and pan the image using the mouse and controls on the right.
There is also this recent discussion on the Pangeo discourse https://discourse.pangeo.io/t/vector-data-cubes/2904 about vector data cubes in xarray
that might be relevant to the long-term goal of greater ndcube
/xarray
interoperability.
Hi @ianthomas23 great to see some progress :rocket:
Is it possible with a functional index to not have to compute the grid of all the coordinates (in a way more nuanced than making a delayed dask call)? For some of my datasets the coordinate grid is prohibitively large, and one of the key things I think that the new functionality in xarray provides would be the ability to only call the WCS when computing the world coordinates is required?
I am curious about why you chose to use WCSPixel2WorldTransform
rather than using the WCS
classes methods? Specifically I expect pixel_to_world_values
would be useful for you.
Thanks for sharing @ianthomas23.
I wanted to give a try at creating an Xarray WCSIndex
, so I started from your notebook and uploaded a new version here.
In this version, I updated the computation of the world coordinates so that these are fully lazy (i.e., x_wcs
and y_wcs
dask arrays are created from scratch). Hopefully it should scale to large grids reasonably well. I'm sure it can be further optimized and improved, though, there's not much functionality. I only implemented point-wise selection of world coordinates (not sure how to implement selection of slices from WCS coordinates).
hey @benbovy thanks for the example with the lazy loading! I assume that all the xarray machinery requires an array of coordinates to be generated, lazy or not? There's no functionality for computing them on the fly?
Do you happen to have a link to the best place for me to learn more about what's shown in your example, i.e how Index classes work and what custom ones should support? (Basically asking so I can attempt to answer how to do other selections).
I am curious about why you chose to use
WCSPixel2WorldTransform
rather than using theWCS
classes methods? Specifically I expectpixel_to_world_values
would be useful for you.
@Cadair I just used whatever I found that worked for me. I am happy to take advice on a better approach.
I assume that all the xarray machinery requires an array of coordinates to be generated, lazy or not? There's no functionality for computing them on the fly?
Yes, an Xarray index must always be attached to one or more coordinates in a Dataset or DataArray. That said, the model is quite flexible regarding how to arrange coordinates and indexes and what to do in indexes.
Actually, it could be possible to create a WCSIndex
from the x
and y
pixel coordinates and skip creating the x_wcs
and y_wcs
coordinates entirely. You can see in the notebook that the WCSIndex
does not make any use of the Xarray world coordinates.
WCSIndex
might encapsulate two xarray.indexes.PandasIndex
instances for the two x
and y
pixel coordinates so that they can be used like "classic" dimension coordinates, i.e., for selection, alignment, etc. In addition, WCSIndex.wcs
could be used for selection based on world coordinates, like in the notebook but maybe using an API like ds.sel(x=x_world_labels, y=y_world_labels, method="world_nearest")
. The approach is somewhat similar than the one used the geo (vector) index here: https://github.com/martinfleis/xvec/pull/7.
There's an open PR in Xarray to make it easier creating custom indexes that encapsulate multiple PandasIndex
instances: https://github.com/pydata/xarray/pull/7182
Although it might be possible to reuse the method
parameter of .sel()
for interpreting the given labels either as pixel or world values (cf. example above), a cleaner solution may be to provide custom options. But it is not supported yet (https://github.com/pydata/xarray/issues/7099).
Do you happen to have a link to the best place for me to learn more about what's shown in your example, i.e how Index classes work and what custom ones should support?
Not yet, unfortunately. This is still very much work in progress. Some useful links:
I am fascinated by this ongoing discussion, thank you everyone for taking part!
I have a side-question on WCS, and thought I could add it here. Suppose I have a 3D cube of data, where each 2D slice is a simple image with a separate analytical WCS, all with the same projection but different parameters. Is it possible to construct a single WCS for the whole cube, with the WCS parameters being a lookup table dependent on the z location in the cube?
This is the kind of situation that kerchunk may be faced with.
Is it possible to construct a single WCS for the whole cube, with the WCS parameters being a lookup table dependent on the z location in the cube?
I guess it would be possible to store a list of astropy.wcs.WCS
objects for the z dimension in the wcs
Dataset accessor and/or provide a StackedWCSIndex
that encapsulates (1) a list of WCSIndex
instances and (2) an additional PandasIndex
for the z dimension? Unless there's a better way of using astropy.wcs.WCS
?
(Ideally all WCSIndex
instances in a StackedWCSIndex
would reuse the same two PandasIndex
objects for the x/y pixel coordinates).
Is it possible to construct a single WCS for the whole cube, with the WCS parameters being a lookup table dependent on the z location in the cube?
With astropy.wcs.WCS
it is not. It's technically possible with gWCS which conforms to the same WCS API, but it's pretty experimental.
I think @benbovy's approach of a list of WCSIndexes could work, but I am not sure I have enough grounding in xarrray and indexes to know for sure.
As an aside, I have complex 5D WCSes where the pointing (wcs parameters) change along two of the axes (which are time and wavelength) as well as more intertwined ones where pointing changes along one of the spatial axes and the time axes (of which there are two) so these things can get really complex!!
Here is a new notebook demonstrating multidimensional lazy-evaluated WCS coordinates in an xarray.Dataset
. The data array has shape (nwavelength, ntime, ny, nx)
as there are nwavelength
sensors taking images of shape (ny, nx)
at ntime
regular time intervals. Of the 11 WCS attributes required to construct a WCS transformation, 4 of them (all strings) are constant for this data and therefore stored as Dataset
attributes. The other 7 (all floats) are different for each image so each is stored as a separate DataArray
of shape (nwavelength, ntime)
. The x and y WCS coordinates are each a dask.array
of shape (nwavelength, ntime, ny, nx)
and are lazy-evaluated on a per-image basis given a wavelength index and time index. The astropy.WCS
transformation objects are created on demand using the 4 constant plus 7 varying WCS attributes.
The example uses an xarray
Accessor
but not an Index
. Example data is pretty small (2, 10, 4096, 4096)
. I have gone up to (6, 1246, 4096, 4096)
with a preprocessed local zarr
dataset of ~65 GB where each (ny, nx)
image is a separate chunk and the performance scales well.
The notebook isn't live so nothing happens when you play with the widgets at the bottom. I can do a live demo of this, or the bigger one, if anyone is interested.
Description
Support of gWCS in ndcube enables the possibility of converting between an ndcube and xarray objects. Although ndcube and xarray will still be preferred in different situations, the ability oto convert between them may still be highly useful. Is this something ndcube should support?