ioos / APIRUS

API for Regular, Unstructured and Staggered model output (or API R US)
Creative Commons Zero v1.0 Universal
2 stars 1 forks source link

Create lightweight package for vertical coords #2

Open rsignell-usgs opened 9 years ago

rsignell-usgs commented 9 years ago

The calculation of the vertical coordinate according to CF involves a lot of logic -- logic that has already been handled by Iris. We should try to "steal" the code from Iris and create a lightweight package that can compute the vertical coordinate. This should be independent of UGRID or SGRID code.

ocefpaf commented 9 years ago

The problem:

Comments:

Biggus: I am experiencing a weird bug with biggus where the computation only works if I run it twice. Not sure if I am doing things correctly... At least it is very fast with a DAP URL and a big eta. @pelson can you spot if I am doing things correctly in [2]?

Dask: Sometimes a got a netCDF I/O error, sometimes it completes after a long time. I am not sure if my naive chunking is the root of the problem or if dask always computing everything just to hand me the indices I asked for in the end.

[1] http://nbviewer.ipython.org/gist/ocefpaf/94b6ea56bbe13a014b8c [2] http://nbviewer.ipython.org/gist/ocefpaf/7f334e81fb92a06ba90e [3] http://nbviewer.ipython.org/gist/ocefpaf/8be9a58b36dd6700ae63

PS: Cell [4] needs a lot of improvement. Iris is really smart at this array shape matching, we might steal some code from them :wink:

ocefpaf commented 9 years ago

@rsignell-usgs I started a package and a to do list. (See https://github.com/pyoceans/odvc/issues/3.)

rsignell-usgs commented 9 years ago

Woohoo, @ocefpaf! Hit the ground running!

pelson commented 9 years ago

We should try to steal the code from Iris and create a lightweight package that can compute the vertical coordinate.

By steal, I'm hopeful you really mean "make a lightweight package for CF vertical coordinates which Iris can use"? I think the pattern that we took with cf_units would work really well here - fork it (into its own package, but being careful of the licensing terms of Iris), get it to the stage where it really is a standalone library, and ultimately get Iris using it (as well as any other tool out there).

Biggus: I am experiencing a weird bug with biggus where the computation only works if I run it twice. Dask: Sometimes a got a netCDF I/O error, sometimes it completes after a long time.

They are practically the same abstraction (though biggus' interface allows you to care less about the structure of your data, whereas dask gives you more control on how to break down the problem). We have a proof of concept which effectively allows you to write biggus code, and generate a dask.array graph, so personally I wouldn't invest a lot of time trying to manage the two interfaces...

RuntimeError: NetCDF: I/O failure

This is the exception you are having with biggus. It sounds like you are also having this problem with dask. I suspect you're hitting the thredds server concurrently, and it is choking maybe?

All in all, :+1: for creating a lightweight package which can turn NC into vertical coordinate metadata and array, but :-1: if it is a literal fork of Iris' VC handling with no intention of pulling this stuff back into Iris.

ocefpaf commented 9 years ago

By steal, I'm hopeful you really mean "make a lightweight package for CF vertical coordinates which Iris can use"?

That is the goal!

I think the pattern that we took with cf_units would work really well here

Which BTW I am very late to finish and re-integrate it into iris :flushed:

  • fork it (into its own package, but being careful of the licensing terms of Iris).

Right now we have only the formula for the Dimensionless Coordinates. So I am not sure if any licensing applies. This code was already sitting in many places before it made into iris... Which makes me wonder if the CF-conventions should provide those as pseudo-code with a Python implementation that we can refer to... but I digress.

I wouldn't invest a lot of time trying to manage the two interfaces

We do not plan to manage both. We are just experimenting which way to go. Right now biggus was easier to implement and much faster at the last slicing (see the notebooks above). It took dask several minutes to finish the same task. That might be my bad chunking, but the fact that biggus allowed me to "care less about the structure of your data" was a wining point to me.

This is the exception you are having with biggus. It sounds like you are also having this problem with dask. I suspect you're hitting the thredds server concurrently, and it is choking maybe?

Dask does not raise the exception just biggus. I do not know how to properly overcome that.

but :-1: if it is a literal fork of Iris' VC handling with no intention of pulling this stuff back into Iris.

Fear not! API'R'US (blame @rsignell-usgs for the name!) is just a playground to get people talking. The goal is to come up with common APIs that everybody can use.

PS: I am raising a few question about biggus later to understand more what is going on.

ocefpaf commented 9 years ago

Getting close: http://nbviewer.ipython.org/gist/ocefpaf/677c5ab7f80dcf1ba3db

This version is super fast! And Biggus is no longer choking on THREDDS.

Comments please!

I plan to clean this up and add this functionality to https://github.com/pyoceans/odvc.

rsignell-usgs commented 9 years ago

@ocefpaf, this is looking good! The only problem I see so far is the assumption that 1D variables are defined in 1D vertical space. This is true for ROMS, but not true for FVCOM, where siglay, the variable containing the fraction of the water column occupied by the layer, is 3D. Example here: http://comt.sura.org/thredds/dodsC/data/comt_1_archive/inundation_tropical/USF_FVCOM/Hurricane_Ike_3D_final_run_without_waves.html

ocefpaf commented 9 years ago

Indeed we need to be smarter about that. I just want to get a few cases going before making odvc more general.

As you can see from my comment in the function, this problem only exists in nc2biggus, if the user wants to take care of the input shapes they can still use formulas.py. For example, right now iris could import formulas.pyto refactor the _derive() method in the AuxCoords. (Although I do not think that is a good idea. Let's wait until we can do more with odvc.)

rsignell-usgs commented 9 years ago

@pelson , I just want to confirm what @ocefpaf said above: like withcf_units, the goal here is to simply split this functionality out of Iris to enhance flexibility, and to make sure we do it in a way that is easy for Iris (and other packages) to use.

The overall goal is to create a collection of tools that are as lightweight as possible that make it easy to work with both unstructured and structured grid model results.

ChrisBarker-NOAA commented 9 years ago

""" The overall goal is to create a collection of tools that are as lightweight as possible that make it easy to work with both unstructured and structured grid model results. """ And to be able to do that with Iris, or Xray, or ????

So it's still the goal to have py_ugrid and pysgrid be libs that Iris can use to support these grid types.

-CHB

ocefpaf commented 9 years ago

this is looking good! The only problem I see so far is the assumption that 1D variables are defined in 1D vertical space. This is true for ROMS, but not true for FVCOM, where siglay, the variable containing the fraction of the water column occupied by the layer, is 3D.

@rsignell-usgs I was investigating a way to use the CF definition (see below) to conform the arrays into the proper shapes for computing z.

z(n, k, j, i) = eta(n, j, i) + sigma(k) *
                (depth(j, i) + eta(n, j, i))

The problems with FVCOM are:

In a way the CF-conventions are wrong or FVCOM is not CF :wink:

Anyway, I found that xray has a convenience method that might help. Still, I do not want to make xray specific code. So we have to be smarter about this and implement our own lightweight variable(dims) checks and broadcasting.

rsignell-usgs commented 9 years ago

@ocefpaf, you are right that by those precise definitions, the vertical coordinate in FVCOM is not strictly CF. But there is no real reason to restrict the computation of the coordinate to only cases where sigma and eta have specific numbers of dimensions. All that is necessary is that we can compute the vertical coordinate at each location using the formula, so our library should work with that, broadcasting as necessary. Hmmm... I think all I really just said was "Yes, I agree". :smile_cat:

ocefpaf commented 9 years ago

I have a prototype that is almost working here. I will upload that soon...

ocefpaf commented 9 years ago

Getting close! We have to,

NB that we do not "have" an API yet. Except for prepare_arrays (that is a mess) I tried to break all the functionalities into the most atomic function possible. So hopefully we can now glue the pieces together and build a nice API from the ground up. Like pass a file/url/nc and get the z-coord. The only part of the code that I do not want to modify is formulas.py. I want that to be just the equations as we read in the CF-convention.

@rsignell-usgs see the ocean-sigma-coordinate for both FVCOM and POM in action:

http://nbviewer.ipython.org/github/pyoceans/odvc/tree/master/notebooks/

ocefpaf commented 9 years ago

Does anyone have a DAP endpoint for an ocean_double_sigma dataset?

rsignell-usgs commented 9 years ago

@ocefpaf, let's not worry about ocean_double_sigma or ocean_sigma_over_z as I'm not sure anyone is actively distributing model results using these formulas.

Regarding the intermittent errors with OPeNDAP, do you still think it would be a good idea to see if we could get the xray retry approach you reference above into netCDF4-Python?

ocefpaf commented 9 years ago

let's not worry about ocean_double_sigma or ocean_sigma_over_z

Sure. I won't make that a priority.

Regarding the intermittent errors with OPeNDAP, do you still think it would be a good idea to see if we could get the xray retry approach you reference above into netCDF4-Python?

The intermittent errors with OPeNDAP happened because dask were hitting the THREDDS server concurrently, and the server choked. xray approach would makes things slower in that case. I solved it by disabling dask's multi-threading. The errors are gone and things are surprisingly faster now.

Now... Getting that into netCDF4-python might be a good idea. I will experiment with that later in a class similar to @kwilcox's EnhancedNetcdf.