Lazy loading of coord info?

ioos / APIRUS

API for Regular, Unstructured and Staggered model output (or API R US)

Creative Commons Zero v1.0 Universal

2 stars 1 forks source link

Lazy loading of coord info? #8

Open ChrisBarker-NOAA opened 8 years ago

ChrisBarker-NOAA commented 8 years ago

In the use case doc, there is:

not load any coordinate data or data values

Question: I think the coordinate data needs to be loaded to do anything. So is it so bad to load it up into memory when the object is created?

Clearly, lazy loading is critical for all the data associated with the grid -- it can be huge, and a given use case may not need all of it anyway.

NOTE: py_ugrid currently does load up the grid info up front.

kwilcox commented 8 years ago

@rsignell-usgs and I talked about this briefly. If we are going to have a common API between grid types, the individual libraries that "know" about that type of grid (pysgrid, pyugrid, etc.) will be on their own to determine the best way to do this... and I think that is what we want.

Ideally each library will keep some sort of spatial tree index to make lookups fast (for example, when extracting time-series or sub-setting the grid). That index can be build when the object is loaded, when the method is called, or loaded from a disk serialization... it would be up to the implementing library.

ocefpaf commented 8 years ago

some sort of spatial tree index.

@kwilcox I am just curious. You seem to use rtree a lot, instead of the most popular kdtree, what are your reasons behind it?

Also, most of the time we use a naive trees. I mean trees that do not take the underlying projection into account. I am not an expert, but I do know that this approach can yield some wrong results in certain cases.

PS: Note that with PR https://github.com/ioos/conda-recipes/pull/452 we finally have rtreeon Windows. I am just not sure if it is working properly... Calling for testers!

ChrisBarker-NOAA commented 8 years ago

@kwilcox: yup -- good point, this is implementation detail, not API.

@ocefpaf: kdtree is for points, r-tree is for bounding boxes. if you want to know what cell you are in, then you can use a kdtree to find the closes point, but that doesn't actually tell you what cell you're in. You still need to check and search around the location for the right cell.

So an r-tree may be an easier option.

Note that we have a new implementation of a "cell tree", which was designed specifically to find cells in unstructured grids. We have it for triangular meshes and quad meshes (so can be used for curvilinear grids). Not in a public repo yet, but the plan was to use it in pyugrid

Also -- pyugrid currently has an option kdtree for use in the nearest point method.

ocefpaf commented 8 years ago

Thanks @ChrisBarker-NOAA.