pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.62k stars 1.08k forks source link

xarray / vtk integration #4470

Open rabernat opened 4 years ago

rabernat commented 4 years ago

I just had a great chat with @aashish24 and @banesullivan of Kitware about how we could improve interoperability between xarray and the VTK stack They also made me aware of pyvista, which looks very cool. As a user of both tools, I can see it would be great if I could quickly drop into VTK from xarray for advanced 3D visualization.

A low-hanging fruit would be to simply be able to round-trip data between vtk and xarray in memory, much like we do with pandas. This should be doable because vtk already has a netCDF file reader. Rather than reading the data from a file, vtk could initialize its objects from an xarray dataset which, in principle, should contain all the same data / metadata

Beyond this, there are many possibilities for deeper integration around the treatment of finite-volume cells, structured / unstructured meshes, etc. Possibly related to https://github.com/pydata/xarray/issues/4222.

I just thought I would open this issue to track the general topic of xarray / vtk integration.

banesullivan commented 4 years ago

I'm really excited about this effort, thanks @rabernat! One thing I'd like to do before I forget is mention a project from the Software Underground (SWUNG) to create a new subsurface package: https://github.com/softwareunderground/subsurface

Over there, we are working to create a set of low-level data structures for sharing spatial data between various subsurface modeling/analysis software (e.g. GemPy, SimPEG, PyGIMLi, Fatiando, Welly, and more) with close ties to PyVista (VTK) for 3D visualization. At present, we are using xarray as the base data container for structured data types. This morning, @rabernat gave me some great insight into how xarray can be used to contain unstructured data as well and so I will be investigating how to do that so we leverage xarray as the base data container all around in subsurface. Then with subsurface's close ties to PyVista, there comes a bridge between xarray and VTK (via PyVista).

cc @Leguark and @prisae

This would be a proof of concept for the round-trip data interoperability going between xarray and VTK leaving us with some good lessons on what the value would be of a direct interface between VTK and xarray for better handling of large data and leveraging Dask.

Either way, it'd be awesome to have the subsurface development team in the loop here as there appears to be a lot of shared goals

Leguark commented 4 years ago

Thank @banesullivan to bring us to the loop. Is there some example somewhere using xarray for unstructured data or that is exactly what are you going to try next?

rabernat commented 4 years ago

You can see an example of using xarray with structured curvilinear coordinates here:

And with unstructured data here:

The key concept is that xarray supports both dimensions coordinates and non-dimension coordinates. The dimension coordinates must be 1D, but the non-dimension coordinates can have any dimensionality. For a regular lat-lon grid, a variable might have dimensions like this

temp(time, depth, lat, lon)

A structured curvilinear 2D grid might instead look like

temp(time, depth, j, i)

with additional coordinate variables

lon(j, i)
lat(j, i)

which can be used for visualization (but not, currently, for indexing) A fully unstructured mesh in 2D would instead look like

temp(time, depth, cell_id)
lon(cell_id)
lat(cell_id)

This is exactly what netCDF does to encode these data types. Anything that can go into a netCDF file can be represented in Xarray. You just don't get the full functionality in terms of label-based selection. That will hopefully change as we implement more flexible indexing (see https://github.com/pydata/xarray/projects/1).

Another limitation of xarray is that it has no explicit notion of "cell bounds," other than recognizing these as coordinates (see #2844). Our xgcm package works around this limitation in some simple ways.

rabernat commented 4 years ago

A key point I forgot to make...if downstream packages or accessor implementers know how to do something useful with these extra coordinates, they are free to do so! The data are there...xarray just doesn't currently make much use of them.

benbovy commented 4 years ago

if downstream packages or accessor implementers know how to do something useful with these extra coordinates, they are free to do so! The data are there...xarray just doesn't currently make much use of them.

FWIW, we are working on the https://github.com/ESM-VFC/xoak package to easily index and select unstructured data in xarray datasets. This works well with multi-dimensional coordinates and xarray's advanced indexing features. Support for custom indexes and chunked (dask) coordinates is coming up too!

rsignell-usgs commented 4 years ago

Just a note that the only unstructured grid (triangular mesh) example I have is: http://gallery.pangeo.io/repos/rsignell-usgs/esip-gallery/01_hurricane_ike_water_levels.html

I figured out how to make that notebook from the info at: https://earthsim.holoviz.org/user_guide/Visualizing_Meshes.html

The "earthsim" project was developed by the Holoviz team (@jbednar & co) funded by USACE when @dharhas was there. Would be cool to revive this.

The Holoviz team and USACE might not have been aware of the UGRID conventions when they developed that code, so currently it's a bit awkward to go from a UGRID-compliant NetCDF dataset to visualization with Holoviz (as you can see from the Hurricane Ike notebook). That would be low-hanging fruit for any future effort.

jbednar commented 4 years ago

The same dataset as in that Visualizing_Meshes notebook is also shown in https://examples.pyviz.org/bay_trimesh, which is more self-contained and may be a better starting point.

Also note that VTK and pyvista are well supported by Panel, as illustrated in various examples at https://panel.holoviz.org/gallery. It would be great to update those examples to include xarray data sources to more fully capture the pipeline from data to display.

I'm not sure what the specific pain points are with UGRID or what specifically is being requested at https://github.com/pyviz-topics/EarthSim/issues/326, but we'd be happy to have examples either in EarthSim or at examples.pyviz.org that start with UGRID datasets and go via Xarray to either 2D or VTK-based 3D visualizations.

rabernat commented 3 years ago

I just saw this very cool tweet about ipyvista / iris integration and it reminded me of this thread.

Are there any clear steps we can take to help advance the vtk / pyvista / xarray integration further?

RichardScottOZ commented 3 years ago

Ryan what would you start with? I had a use case this week.

RichardScottOZ commented 2 years ago

And again mesh to data array and back, by variety?

banesullivan commented 2 years ago

I think a first step would be to make a data accessor between xarray and VTK/PyVista in accordance with https://xarray.pydata.org/en/stable/internals/extending-xarray.html

This keeps coming up again and again for me. @RichardScottOZ, if you're interested in kicking this off, I'd be happy to chat and help implement this and then outline some future direction for this sort of "integration"

RichardScottOZ commented 2 years ago

Yes, that sounds good. Definitely be useful to put this in some workflows.

RichardScottOZ commented 2 years ago

I use this sort of thing a lot, anyway. https://banesullivan.com/pyvista/examples/raster.html#sphx-glr-pyvista-examples-raster-py

RichardScottOZ commented 2 years ago

So reversing that would require storing the metadata etc. and whatever the meshgrid reversal is?

RichardScottOZ commented 2 years ago

e.g. for a Uniform Grid

image

data = result['density'].reshape(result.dimensions[2],result.dimensions[1],result.dimensions[0])
XC = [result.bounds[0]+200*i for i in range(result.dimensions[0])]
YC = [result.bounds[2]+200*i for i in range(result.dimensions[1])]
ZC = [result.bounds[4]+200*i for i in range(result.dimensions[2])]
da = xr.DataArray(data=data, dims=["z", "y","x"], coords={"z": ZC, "y": YC, "x": XC})

image

jbusecke commented 2 years ago

I am very interested in this sort of functionality as an xarray accessor. If I can help in any way, please let me know. Ideally this work would come in very handy to visualize Oxygen Minimum Zones in the global ocean as isosurfaces of a 3D oxygen array.

RichardScottOZ commented 2 years ago

Ok, so a discussion:-

https://github.com/pyvista/pyvista/discussions/2375

RichardScottOZ commented 2 years ago

Just have to find some time to start.

banesullivan commented 2 years ago

Following up to mention that I've developed pyvista-xarray which provides an interface between vtkRectilinearGrids and xarrays DataArrays via PyVista: https://github.com/pyvista/pyvista-xarray

This is very much in its early stages and I’m hoping to take it quite a bit further. But first, I’d love to gather feedback and find how it works or doesn’t work for the types of data folks have.

At present, this accessor is specifically for rectilinear-style grids but if you have some other types of data, do share so that I can work towards supporting it in pyvista-xarray by opening an issue or discussion over there.

(there is limited support for vtkStructuredGrids as well)

keewis commented 2 years ago

do you want to add this to the ecosystem page in the docs?

banesullivan commented 2 years ago

do you want to add this to the ecosystem page in the docs?

I'd love to! Thanks for the suggestion @keewis! I'll track this in https://github.com/pyvista/pyvista-xarray/issues/21 to contribute after I make my next round of improvements to pyvista-xarray