ioos / APIRUS

API for Regular, Unstructured and Staggered model output (or API R US)
Creative Commons Zero v1.0 Universal
2 stars 1 forks source link

A lightweight CF model #10

Open ocefpaf opened 8 years ago

ocefpaf commented 8 years ago

Since the first time I saw iris I fell in love with its interpretation of the CF-conventions*. It is not a simple metadata bookkeeping like column/index labels in pandas or xray, nor a "bag" dictionary holding all the metadata. It is a full-fledged CF-convention parser to create a Python object. Propagating units, checking for compliance, etc.

I am completely unfamiliar with the tool iris uses to do this (pyke) and I never looked in to the details of the implementation. However, it would be extremely useful if we could take the approach used in cf_units and create a standalone module that generates a CF-python object. This object could be used by iris to create the cube. And it would also be possible to use it to create other objects, maybe even a CF xray.Dataset.

Note that there is a cf_python module out there, but I never looked if it fits our needs. (Well... We have to define our needs first don't we?)

I believe we do not have the manpower to do this right now, but I wanted to open the issue here to keep this idea alive and start a discussion.

Pinging @pelson and @rhattersley. Are you :+1: or :-1: ? Do you think this is possible? Do you think this is useful? Or do you think this is a wild-goose chase?

* The truth is that this is a love-and-hate relationship. The CF interpretation is so good that it brings all of CF shortcomings to the cube :stuck_out_tongue_winking_eye:

rhattersley commented 8 years ago

Thanks for the ping. :smile:

I am completely unfamiliar with the tool iris uses to do this (pyke)

Pyke is not core to Iris at all. It's just happens to be used to translate CF-netCDF files into Cubes, but it's the Cube which embodies CF in a Python object.

it would be extremely useful if we could take the approach used in cf_units and create a standalone module that generates a CF-python object. This object could be used by iris to create the cube.

I'm keen to explore a core + optional extras model with Iris (e.g. https://github.com/SciTools/iris-extras/issues/7 and https://github.com/SciTools/iris/pull/1789). The improving package/dependency management tools make it more feasible for us to pull capabilities out of the core Iris package and into extension packages. In the logical conclusion of that model the "CF-python object" is the Cube. I'm guessing you don't see things in quite the same way though, so I'm eager to understand the difference. Speaking of which...

it would also be possible to use it to create other objects, maybe even a CF xray.Dataset.

How would a "CF xray.Dataset" differ from your "CF-python object"?

The CF interpretation is so good that it brings all of CF shortcomings to the cube :stuck_out_tongue_winking_eye:

I think you once said something roughly equivalent to "I use xray by default and iris when I need CF compliance". I'd love to know more about what makes you reach for Iris.

rsignell-usgs commented 8 years ago

I had a long talk with @kwilcox about this yesterday.

The CF model itself actual consists of functionality that can be separated: unit conversion, vertical coordinate calculation, standard_name manipulation, handling of different common data model featureTypes (Grid, Point, TimeSeries, TimeSeriesProfile, Profile, Trajectory, TrajectoryProfile). Grid handles only data which is colocated with coordinate values.

To handle many of the newer oceanographic, atmospheric and hydrologic models, we also need support for grids where the data is not colocated with the coordinate data (staggered grid) and data which is on non-rectangular mesh (unstructured grid). This was the motivation behind the UGRID and SGRID conventions, and the "pyugrid" and "pysgrid" packages.

We were thinking that if these packages could provide standard methods for these regular grid, ugrid or sgrid objects (e.g. subsetting and regridding methods that return specific featureTypes) then they could be passed into functions that would do things like return a vertical transect along a specified path, regardless of the type of object. And folks who come up with some other type of model feature type (possible spectral representation for FEM models like Imperial College ICOM model) could create their own package, as long as they provided the appropriate methods.

Could Iris be the package that orchestrates this functionality?

I don't see why not. The main things that keep me from using Iris more are: (1) awkward slicing on coordinate values (e.g. compared to Xray); (2) long time to open and inspect a dataset; (3) lack of a dataset concept; (4) monolithic structure.

Yet (1) is probably easily overcome, (2) may be just a question of learning how to inspect a dataset with Iris (using raw over strict), (3) may not be a real problem as long as cube lists don't actually duplicate coordinate data and (4) is being worked on.

ocefpaf commented 8 years ago

Pyke is not core to Iris at all. It's just happens to be used to translate CF-netCDF files into Cubes

I did not say "core of iris." But bare in mind that 99.99% of the time our data is in the netCDF format. That means pyke, for us, is the CF parser in iris.

but it's the Cube which embodies CF in a Python object.

What we imagine is an object one step behind the cube. Maybe just a new netCDF object with some CF modifications and checked for compliance, or a dict of dicts mapping nc.variables and nc.dimensions to CF definitions. I must sound like an 8 year old wishing for a dirty bike with a rocket :bike: + :rocket:

I'm keen to explore a core + optional extras model with Iris (e.g. SciTools/iris-extras#7 and SciTools/iris#1789).

I guess that the grid support, like pyugrid and pysgrid, fall into the optional extra models category.

In the logical conclusion of that model the "CF-python object" is the Cube. I'm guessing you don't see things in quite the same way though, so I'm eager to understand the difference. Speaking of which...

The cube is more than the CF-object, and that is the main problem. My imaginary CF-object would be a lighter cube-like constructor behind the cube. Here are some examples of why we want something like this:

How would a "CF xray.Dataset" differ from your "CF-python object"?

There is no "CF xray.Dataset" yet, but the CF-python object would help create it. One could add vertical coordinate to the Dataset using the information parsed by the CF-python object. If someone wants to do this in xray right now they would have to re-invent the wheel. CF-python object would provide the wheel parts for this task and it will no longer be re-inventing the wheel but rather "assembling the wheel."

Maybe these two example will help:

If we could have an intermediate object maybe we could do this:

formula_terms = awesome_cf_object.get_formula_terms()

The formula_terms would be a mapping to the formula terms vars, dimensions, standard_name, etc. All parsed in a similar way iris does and checked for compliance.

I think you once said something roughly equivalent to "I use xray by default and iris when I need CF compliance". I'd love to know more about what makes you reach for Iris.

I am writing a blog post about this can you wait for it? :stuck_out_tongue_winking_eye:

rsignell-usgs commented 8 years ago

@lesserwhirls and @dopplershift, I'm bringing you guys into this discussion too, because it would be great if we could all be working toward harmonization of access in python to the common data model featureType objects, and I know you are working on the Siphon API for accessing Unidata technologies.

rhattersley commented 8 years ago

I am writing a blog post about this can you wait for it? :stuck_out_tongue_winking_eye:

Depends how long I need to wait... :stuck_out_tongue_winking_eye:

ocefpaf commented 8 years ago

Ooops. My laptop died with that post and never configured the new one for the blog... Sorry.

rhattersley commented 8 years ago

Are you planning to create a new post? Either way, I'd still love to know more about what helps/hinders your usage of Iris.

ocefpaf commented 8 years ago

Are you planning to create a new post?

Yes. As soon as I have some free time to restore my old HDD.

Either way, I'd still love to know more about what helps/hinders your usage of Iris.

In a gist the post will be about how the CF model in iris helps our workflow.

PS: The hinders are mostly the slicing (the reason why xarray is so popular is the panda-like slicing) and the lack of support for 2D coordinates (99% of oceans models use 2D coords).

rhattersley commented 8 years ago

Yes. As soon as I have some free time to restore my old HDD.

Super! Thank you! :smile:

The hinders are mostly the slicing...

I'm trying to get a shared plan together for that: https://github.com/SciTools/iris/wiki/IEP-1

ocefpaf commented 8 years ago

I'm trying to get a shared plan together for that: https://github.com/SciTools/iris/wiki/IEP-1

Awesome! I made a few comments here:

https://via.hypothes.is/https://github.com/SciTools/iris/wiki/IEP-1

I guess that hypothes.is needs chrome/chromium to work.

rsignell-usgs commented 8 years ago

@rhattersley Here's an example that shows the kind of thing that hinders usage of Iris. In this notebook, the user just wants to do something very simple and common: extract time series data in a specified date range and plot them up: https://gist.github.com/rsignell-usgs/13d7ce9d95fddb4983d4cbf98be6c71d

Not only is the xarray syntax a lot simpler, but it's a lot faster. The speeds are listed in the notebook, but I'm summarizing them here:

Xarray: 1 loop, best of 3: 857 ms per loop
Iris: 1 loop, best of 3: 1min per loop

Xarray is 60 times faster!

rhattersley commented 8 years ago

I guess that hypothes.is needs chrome/chromium to work.

@ocefpaf - chrome was the only browser that showed the overlay widgets, but even with chrome I couldn't see any comments.

Here's an example that shows the kind of thing that hinders usage of Iris.

@rsignell-usgs - thanks! :+1:

ocefpaf commented 8 years ago

@ocefpaf - chrome was the only browser that showed the overlay widgets, but even with chrome I couldn't see any comments.

Weird I lost the comments too. I guess it is because the wiki was modified. Anyways I just wanted to avoid making this thread longer... so here it goes (short version):

rhattersley commented 8 years ago

I just wanted to avoid making this thread longer

:+1: We can move any further discussion to SciTools/iris#1988.