paleolimbot / geos

Open Source Geometry Engine ('GEOS') R API
https://paleolimbot.github.io/geos/
Other
61 stars 8 forks source link

prospects #2

Closed mdsumner closed 4 years ago

mdsumner commented 4 years ago

Neat work here, I've been toying with vctrs handling for geometry in related ways but you've taken this quite a long way. Just to share some thoughts, no rush on any of this just wanted to lay it out.

I'd like to have a very barebones GEOS package, one that didn't add any new types or helpers but is just low-level calls to what the library functions can do. This is similar to my vapour and PROJ package, I want access to these facilities without going through an R level of interpretation. I don't consider those user-packages, they are for developing other ideas.

The R levels and ease of useability are really important! But, they are relatively easy to add and should be in higher level package/s IMO.

It would mean pulling out the GEOS-lower level stuff here into a more "library" package, and I know that's not always straightforward, but keen to explore it if you are.

One thing I wish about sf is that it were built on a lower level foundations, one is lazy-reading (I've tried this in https://github.com/mdsumner/RGDALSQL - so no reading until collect()), the next is to keep geometries in native form (wkb, wkt, geojson) and only unpack them in R when needed. That's closer to what you have here I think. For example if GDAL is involved then use it to pass WKB into the user front-end, pass those direct to GEOS for some tasks, unpack them in R for others. (vapour can scan-read wkb, or wkt directly without the attributes - other times we work direct with WKT from other sources etc.).

paleolimbot commented 4 years ago

Thanks! I'm still unsure what this particular package is trying to do..trying to learn C++ and vctrs is a big part of it. I think I also was trying to see if I could make ggplot2::coord_munch() go faster without converting from data frame with coordinates to sf to GEOS and back, get a really great labeler that required GEOS to work with grid objects, and try to simplify objects before plotting them to make it go faster.

What it's turned into is a collection of common in-memory formats and ways to convert between them. As I think about "lower level" GEOS stuff, it's hard to imagine it being useful without the ability to import and export various formats, and to have those formats work as tibble columns (which is where vctrs fits in). I think I went a bit overboard with the geo_tbl stuff, but it's the format that ggplot2, grid, and graphics devices (sort of) use, so it seemed like there should be some kind of native import/export.

I'm happy for you to make this whatever you'd like or create something new...I don't have a clear handle on the value of something even lower level than this (if or when unary, binary, and predicate operations from GEOS get implemented), but I'd be happy to help if you create something new (although my C++ is definitely being learned as I go).

A preliminary "GeomeryProvider" sketch I put here:

https://github.com/paleolimbot/geom/commit/cf66ef90c533d4bd8da3bf3b166bc0105d21c564

Basically so that operations only have to get defined once, and other packages could potentially subclass a GeometryProvider/Exporter to access GEOS functionality (without having to deal with GEOS internals). I started programming in Java where this kind of thing is common...not knowing much about C/C++, it's very possible that there is a better way.

Laziness would also be awesome! The functions in this package kind of "mark" vectors rather than convert them, which might work for what you're trying to do? I mostly write papers about metals in lakes, so I'm not great with knowing the GIS use cases.

paleolimbot commented 4 years ago

It does some cool things now! You might be interested by this part, which outputs GEOS geometries to something like a data frame. I have to quit on this for a bit to write marginally interesting academic papers, but thought I'd given an update in case it's useful!

mdsumner commented 4 years ago

Definitely, thanks! I shamelessly copied your approach into mdsumner/libgeos, I didn't have strong plans for where to go but your code is very helpful for getting into this

paleolimbot commented 4 years ago

Keep it up! I don't have a strong plan here either...mostly I just wanted to plot so that I could see whether or not the operators worked, and that turned into a three-day rabbit hole extracting coordinates.

CWen001 commented 4 years ago

Sorry to have a question here. Is this package a potential candidate to accelerate sf, just comparable to Pygeos to speed GeoPandas using the vectorized geometry functions. Also see https://github.com/pygeos/pygeos https://github.com/geopandas/geopandas/issues/1155

paleolimbot commented 4 years ago

No idea! It's certainly possible. But it could also stand alone to provide additional functionality/size stability for those who want it. I think the latter is where it should be until the implementations are battle tested (sfs implementations have seen a lot of use and are more likely to hold up).

paleolimbot commented 4 years ago

Right now I'm considering splitting the package...there's a lot of functions that are useful outside of the GEOS context, but until I finish implementing all the GEOS functions, I won't know what they are.

mdsumner commented 4 years ago

Cool stuff man, excited to see the scope of having all of the library available. Hopefully I can get in and contribute soonish

CWen001 commented 4 years ago

Really excited to see the package, and hope it can be key components of the big-spatial-data solution in R!

mdsumner commented 4 years ago

@CWen001 I'm always interested to hear what folks have trouble with for scaling, there's lots of tricks if you want to outline a problem (not here, but in a gist or new repo say)

(there's heaps of options these days, not just one-size-fits all of the original sp world)

paleolimbot commented 4 years ago

I think I've got it so that implementing new functions isn't too bad...the "providers", "exporters", and "operators" are a bit of a mess, but at least they're a mess in one place.

Implementing the whole API and sticking fairly close to the functionality that's exposed is definitely the way to go...anything else is too close to what sf already does very well. The geo_wkt, geo_wkb, and geo_coord_* classes probably don't belong here, but they are necessary. Perhaps the "geovctrs" package? It's hard to know where they fit in the R spatial ecosystem.

I think I can expose a C++ API as well, but I need the features to represented as R objects (pretty sure I can't export geos geometry types, only functions and Rcpp types).

Anyway, no rush, just squirreling away my thoughts here as I poke away at this.

SymbolixAU commented 4 years ago

I can't export geos geometry types

You should be able to Rcpp::wrap() the types, however, which will then make them available in R / Rcpp. I had a go at this with mapbox geometry types.

paleolimbot commented 4 years ago

Thanks @SymbolixAU! That's super helpful.

paleolimbot commented 4 years ago

Poking away at this in raw C, which isn't too bad because the GEOS C API is really stable and really well-done. All the platform-specific details are now squirreled away in the libgeos package (on CRAN, but currently in a battle with CRAN to keep it there). If you have a recent GEOS install already (>3.8.0), this should install in 10s or so!.