Open ChrisBarker-NOAA opened 9 years ago
@rsignell-usgs and I talked about this briefly. If we are going to have a common API between grid types, the individual libraries that "know" about that type of grid (pysgrid
, pyugrid
, etc.) will be on their own to determine the best way to do this... and I think that is what we want.
Ideally each library will keep some sort of spatial tree index to make lookups fast (for example, when extracting time-series or sub-setting the grid). That index can be build when the object is loaded, when the method is called, or loaded from a disk serialization... it would be up to the implementing library.
some sort of spatial tree index.
@kwilcox I am just curious. You seem to use rtree
a lot, instead of the most popular kdtree
, what are your reasons behind it?
Also, most of the time we use a naive trees. I mean trees that do not take the underlying projection into account. I am not an expert, but I do know that this approach can yield some wrong results in certain cases.
PS: Note that with PR https://github.com/ioos/conda-recipes/pull/452 we finally have rtree
on Windows. I am just not sure if it is working properly... Calling for testers!
@kwilcox: yup -- good point, this is implementation detail, not API.
@ocefpaf: kdtree is for points, r-tree is for bounding boxes. if you want to know what cell you are in, then you can use a kdtree to find the closes point, but that doesn't actually tell you what cell you're in. You still need to check and search around the location for the right cell.
So an r-tree may be an easier option.
Note that we have a new implementation of a "cell tree", which was designed specifically to find cells in unstructured grids. We have it for triangular meshes and quad meshes (so can be used for curvilinear grids). Not in a public repo yet, but the plan was to use it in pyugrid
Also -- pyugrid currently has an option kdtree for use in the nearest point method.
Thanks @ChrisBarker-NOAA.
In the use case doc, there is:
Question: I think the coordinate data needs to be loaded to do anything. So is it so bad to load it up into memory when the object is created?
Clearly, lazy loading is critical for all the data associated with the grid -- it can be huge, and a given use case may not need all of it anyway.
NOTE: py_ugrid currently does load up the grid info up front.