ioos / APIRUS

API for Regular, Unstructured and Staggered model output (or API R US)
Creative Commons Zero v1.0 Universal
2 stars 1 forks source link

How to get the grid using pysgrid, pyugrid API? #15

Open ocefpaf opened 8 years ago

ocefpaf commented 8 years ago

Right now we 2 ways to load the grid info using py{s,u}grid,

pysgrid.from_ncfile
pysgrid.from_nc_dataset

pyugrid.UGrid.from_ncfile
pyugrid.UGrid.from_nc_dataset

Note that in pyugrid the method that loads the grid is available as class method only.

I propose we:

@ioos/apirus Ideas?

rsignell-usgs commented 8 years ago

Do you mean we would have a package and class with:

apirus.load_grid()
ocefpaf commented 8 years ago

No. I mean both pyugrid and pysgrid would have a .load_grid().

However, nothing stops us from creating something like apirus.load_grid(). I have an "EnhancedIrisCube" object that does that and returns the grid and grid_typeas properties.

rsignell-usgs commented 8 years ago

Got it. That's enables the "duck typing" we were talking about, right?

ChrisBarker-NOAA commented 8 years ago

OK, that's odd -- I could have sworn UGrid already had a "loadgrid" of some sort that would check what was beng passed in... but I guess not.

So yes, pyugrid and pysgrid should have a .load_grid(). (classmethod, I think)

And then, an apirus.load_grid() in the longer run -- ideally one could load a data file and do analysis on it, and not have to know what type of grid it is at all.

which, now that I think about it, might mean that the functionality of deteriming whehter the pass-in object is a filename, or a url, or a a netCDF.Dataset should actually be in apirus, not both (or more) of the grid objects....

ocefpaf commented 8 years ago

Got it. That's enables the "duck typing" we were talking about, right?

Yes. We don't want want to keep track if it is a nc object or a ncfile/url we just want the grid.

Not to mention that code using both libraries, to load different type grids, would look more consistent.

hetland commented 8 years ago

I think .load_grid() makes sense as a consistent method across the various *grid classes. +1

On Thu, Oct 22, 2015 at 3:51 PM, Filipe notifications@github.com wrote:

Got it. That's enables the "duck typing" we were talking about, right?

Yes. We don't want want to keep track if it is a nc object or a ncfile/url we just want the grid.

Not to mentioned that code using both libraries to load different type grids would look more consistent.

— Reply to this email directly or view it on GitHub https://github.com/ioos/APIRUS/issues/15#issuecomment-150352568.

Prof. Rob Hetland Texas A&M Univ. – Dept. of Oceanography http://pong.tamu.edu/~rob

ChrisBarker-NOAA commented 8 years ago

OK: I added the method to the ABC:

Which braught up a few questions:

1) at least in the UGRID standard, ther can be more than on e mesh in a file -- I have no idea how common that is, but there should maybe be a way to specify which mesh is wanted. UGRid currently has:

from_ncfile(nc_url, mesh_name=None, load_data=False)

If you don't specify a mesh_name, then it will check if there is more than one -- if only one, it returns the UGrid, if more than one, it raises an exception.

Also load_data is a flag indicating whether you want to load up all the variables associated with the grid as well. This is very expensive with the current "bring it all into memory" model, but if/when we support lazy-loading properly, there may be no need for this flag.

Should APIRUS have a similar API?

ocefpaf commented 8 years ago

at least in the UGRID standard, there can be more than on e mesh in a file

@ChrisBarker-NOAA that is a valid point of concern! Thanks for bringing that up.

I am not too familiar with UGRIDs and I guess that even SGRIDs can have also bring more than 1 mesh with there is nesting involved. (Is that assumption correct?)

I have seen APIs with two methods like:

load_grid() # Will return only one and if more than one is found will raise an error
load_grids() # Will return a list of grids and if only one is found it will be a list with a single element

I like that approach. I think that it is clearer than passing names around. And it is specially helpfull when we are exploring a dataset we know nothing about.

Also load_data is a flag indicating whether you want to load up all the variables associated with the grid as well.

@rsignell-usgs and @kwilcox already added lazy loading as a goal for data loading data. I would like to expand that for grids too. Some UGRIDs can be huge.

rsignell-usgs commented 8 years ago

@ChrisBarker-NOAA, the dataset can have more than one grid, but each variable can only have one grid, right?

ChrisBarker-NOAA commented 8 years ago

yup -- a variable can only be associated with one grid.

Also -- the standard allows more than one grid in a single file -- but I have no idea if anyone actually does that. But I suspect Bert put that in for a reason.

ChrisBarker-NOAA commented 8 years ago

@ocefpaf : OK, you may have convinced me that lazy loading of grid info is worthwhile -- that way, we don't have to make a distinction between querying the dataset to see what's there, and loading it up.

i.e. when there is more than one grid in a file.