paleolimbot / geovctrs

Common Classes and Data Structures for Geometry Vectors
https://paleolimbot.github.io/geovctrs
Other
25 stars 1 forks source link

C++ interface #8

Open paleolimbot opened 4 years ago

paleolimbot commented 4 years ago

This is pretty much just adding

// [[Rcpp::interfaces(r, cpp)

at the top of every file that contains

// [[Rcpp::export]]

It would be nice to export some functions that work on GEOSGeometry*, but most exported functions can work on WKB vectors, for which conversion is fast (3 million/sec). I don't think there's any way to export the actual GEOS library or C++ classes (unfortunate, because exporting the Operator classes would make it really easy people to write their own operators).

paleolimbot commented 4 years ago

I can't get this to "just work"...for another day.

paleolimbot commented 4 years ago

Something like StanHeaders might work better for the C++ interface

https://github.com/cran/StanHeaders

SymbolixAU commented 4 years ago

For what it's worth I just got this working in my fork, so I think this approach is worth pursuing?

In this cppFunction() I can 'depend' and 'include' the geom.h, and call the c++ code directly

library(Rcpp)

cppFunction(
  depends = 'geom'
  , includes = '#include "geom.h"'
  , code = '
  Rcpp::IntegerVector rcpp_orientation_index(
    Rcpp::NumericVector ax, NumericVector ay,
    Rcpp::NumericVector bx, NumericVector by,
    Rcpp::NumericVector px, NumericVector py
  ) {
    return geom::cpp_orientation_index(ax, ay, bx, by, px, py);
  }
  '
)

## I don't actually know what this function is supposed to do, but this shows it's working and can be 
## linked to by another package
rcpp_orientation_index(1,2,3,3,2,1)
-1
mdsumner commented 4 years ago

Classic triangle orientation test http://www.cs.tufts.edu/comp/163/OrientationTests.pdf

paleolimbot commented 4 years ago

Yeah! That's what I wanted! I tried it briefly and it worked when I compiled it but failed the CMD check and I don't know why.

What would we do without you, @mdsumner...I too had no idea what that function did.

As a side note, I just moved all the coersion and geometry construction stuff to geovctrs...WKB->GEOS is really fast, so I figured this package should just read/write WKB (relying on geovctrs::as_geo_wkb() to support other formats). I was hoping that it would reduce the C++ code needed for this package, but really the whole Provider/Exporter/Operator stack has to be here too, so maybe supporting all formats at the C++ level is just as easy.

mdsumner commented 4 years ago

heh, I use it here, was trying to remember where - just in R because I'm obsessed with vectorization: https://github.com/hypertidy/silicate/blob/05860d470f9a2dad5fc25237dc130e2ecc37de87/R/triangles.R though I abs() away the orientation because I just wanted the area

Packing and unpacking WKB is one of those core things I wanted out of sf - and you've done it! We at least need some kind of guiding discussion document about what core tools we want, how we see they should work, and who's doing it already ;)

paleolimbot commented 4 years ago

For sure! There's a few things I'd like to get sorted between this and geovctrs, mostly to see if they're possible, then I think it's ready for some more in depth discussion.

SymbolixAU commented 4 years ago

I just moved all the coersion and geometry construction stuff

I see there's some cross-over in the .cpp & .h files between the two libraries (and also probably with rgeos ?)
Would it be worth having one interface package which has all, and only the underlying .c / .cpp / .hpp files, then other libraries can call and build on top of that?

(for examples I have the rapidjsonr and rapidxmlr as interface-only libraries, with no R functions in them)

paleolimbot commented 4 years ago

I have no idea how to do that (I barely know C++), but that would be ideal. I haven't gotten around to removing the functionality in this package that I moved over to geovctrs, but some of the infrastructure around reading and writing would be nice to have in both. Even better if that means we get the newest, shiniest GEOS on all platforms (Ubuntu 16.04 is stuck on GEOS 3.5 by default, which causes some problems).

I don't know if I fully understand the examples (my ignorance, not your code)...I see the header files in inst/, but where are the .cpp files?

SymbolixAU commented 4 years ago

but where are the .cpp files

There aren't any. The header files are both the declaration and definition of the functions.

Using this structure other R packages can link to the header files by specifying the LinkingTo in the DESCRIPTION, then all the C++ code is available.

If we take rapidjsonr and jsonify as an example

#ifndef R_JSONIFY_H
#define R_JSONIFY_H

#include <Rcpp.h>

/* // [[Rcpp::depends(rapidjsonr)]] */

#include "rapidjson/writer.h"
#include "rapidjson/stringbuffer.h"
#include "rapidjson/prettywriter.h"
#include "jsonify/to_json/api.hpp"
#include "jsonify/from_json/api.hpp"

#endif

Similarly, i've written all the jsonify has headers, so if another R package wants to convert JSON <--> R, then can 'LinkTo' jsonify and they'll only have to all jsonify::api::to_json() / jsonify::api::from_json()


Rcpp is similar - it's (mostly) header-only, which means to use it in a package you only need to specify the LinkingTo:, then you can use #include <Rcpp.h> in your own package.


For reference, the rapidjson library is here. You can explore /include/rapidjson/ and see all the functions.

SymbolixAU commented 4 years ago

And while I'm at it, this is also my design philosophy with sfheaders - it's a C++ header-only R package. So other libraries can LinkTo this library and use all the underlying C++ code directly from their own C++ code, without having to go back and forth between R.

So if you had an Rcpp list of matrices you wanted converting into a sf_POLYGON, you can call sfheaders to do all the work, something like (completely untested and made-up code)

#include "sfheaders/sf/sf_polygon"

Rcpp::List lm( 2 );
lm[0] = Rcpp::NumericMatrix(2,2);
lm[1] = Rcpp::NumericMatrix(3,3);
Rcpp::DataFrame sf = sfheaders::sf::sf_polygon( lm );
paleolimbot commented 4 years ago

Awesome! I appreciate the hand-holding. I'll take a stab at it and see how it goes!

SymbolixAU commented 4 years ago

I'm very keen on this, so happy to help!

paleolimbot commented 4 years ago

I got held up on trying to make NA, EMPTY, and infinite values to work, but the C++ is coming. I've put some thoughts down in one of the doc pages about what I was thinking when I wrote geovctrs and would love feedback from you both if you're up for it!

https://paleolimbot.github.io/geovctrs/reference/is_geovctr.html

SymbolixAU commented 4 years ago

I don't fully understand / follow those docs, but that's mainly because I haven't followed the concept of vctrs, nor do I use tibbles, so a lot of it went over my head.

My main requirement for a 'modular' spatial ecosystem is that all the underlying C++ code is exposed and accessible from other packages.

mdsumner commented 4 years ago

@paleolimbot looks good, I still haven't tried much but hoping to soon. I know the vctrs allows far better efficiency and extensibility, I think the headers-stuff can come later (I'd also like to do the same for vapour, PROJ, decido, RTriangle ...)

paleolimbot commented 4 years ago

We're getting somewhere! I think the interface should live here instead of geovctrs? geovctrs is based on GEOS but doesn't have to be. I don't think there's a way to easily include the GEOS sources in an R package (or if there is, I'm not the one to do it).

Rcpp::sourceCpp(
  code = '
  #include "geovctrs/geos-operator.hpp"

  // [[Rcpp::depends(geovctrs)]]

  class BufferOp: public UnaryGeometryOperator {
    GEOSGeometry* operateNext(GEOSContextHandle_t context, GEOSGeometry* geometry, size_t i) {
      // width, segments per quandrant
      return GEOSBuffer_r(context, geometry, 2, 30);
    }
  };

  // [[Rcpp::export]]
  SEXP buffer2(SEXP data, SEXP ptype) {
    BufferOp op;
    op.initProvider(data);
    op.initExporter(ptype);
    return op.operate();
  }
  '
)

buffer2(geovctrs::geo_wkt("POINT (30 10)"), geovctrs::geo_wkt())
#> <geovctrs_wkt[1]>
#> [1] POLYGON ((32 10, 31.99725906950915 9.89…

Created on 2020-04-09 by the reprex package (v0.3.0)

paleolimbot commented 4 years ago

Sorry for the emotional roller coaster here...I'm keeping the tiny header-only library here and exporting C++ functions using Rcpp::interfaces(r, cpp) for most functions. The general theme is that each geo_*_() function in R has a geovctrs_cpp_*() equivalent. This will be use-at-your-own-risk and undocumented for probably a long time, but if or when either of you feel it needs something else, feel free to PR. If it needs to be 100% different that's cool too...having never used a C++ API, I don't know what makes a good one.

Using the exported C++ functions:

Rcpp::sourceCpp(
  code = '
  #include "geovctrs.h"

  // [[Rcpp::depends(geovctrs)]]

  // [[Rcpp::export]]
  Rcpp::IntegerVector ncoords(SEXP data) {
    return geovctrs::geovctrs_cpp_n_coordinates(data);
  }
  '
)

ncoords(geovctrs::geo_wkt("POINT (30 10)"))
#> [1] 1

Created on 2020-04-10 by the reprex package (v0.3.0)

Using the GeovctrsOperator classes to do GEOS stuff without worrying about whether or not you have to call GEOSGeom_destory_r():

Rcpp::sourceCpp(
  code = '
  #include "geovctrs/operator.hpp"

  // [[Rcpp::depends(geovctrs)]]

  class BufferOp: public GeovctrsGeometryOperator {
    GEOSGeometry* operateNext(GEOSContextHandle_t context, GEOSGeometry* geometry, size_t i) {
      // width, segments per quandrant
      return GEOSBuffer_r(context, geometry, 2, 30);
    }
  };

  // [[Rcpp::export]]
  SEXP buffer2(SEXP data, SEXP ptype) {
    BufferOp op;
    op.initProvider(data);
    op.initExporter(ptype);
    return op.operate();
  }
  '
)

buffer2(geovctrs::geo_wkt("POINT (30 10)"), geovctrs::geo_wkt())
#> <geovctrs_wkt[1]>
#> [1] POLYGON ((32 10, 31.99725906950915 9.89…

Created on 2020-04-10 by the reprex package (v0.3.0)

Using the GeovctrsGEOSHandler if you really want to go low-level without worrying about destroying the GEOSContextHandle_t:

Rcpp::sourceCpp(
  code = '
  #include "geovctrs/geos-handler.hpp"
  #include <Rcpp.h>
  using namespace Rcpp;

  // [[Rcpp::depends(geovctrs)]]

  // [[Rcpp::export]]
  IntegerVector geos_orientation(NumericVector Ax, NumericVector Ay, 
                                 NumericVector Bx, NumericVector By, 
                                 NumericVector Px, NumericVector Py) {
    // constructor and deleter take care of initializing
    // and destroying the GEOS context (if I got it right...)
    GeovctrsGEOSHandler handler;

    IntegerVector out(Ax.size());
    for (int i=0; i < Ax.size(); i++) {
      out[i] = GEOSOrientationIndex_r(
        handler.context,
        Ax[i], Ay[i], 
        Bx[i], By[i], 
        Px[i], Py[i]
      );
    }

    return out;
  }
  '
)

geos_orientation(0, 0, 10, 0, 10, 10)
#> [1] 1
geos_orientation(0, 0, 10, 0, 10, -10)
#> [1] -1

Created on 2020-04-10 by the reprex package (v0.3.0)

dcooley commented 4 years ago

Excellent work! I (SymbolixAU) will happily contribute to the C++ API. I'll let your emotions calm down a bit first though and wait for you to stabalise your side of it, then I'll jump in if that's OK?

paleolimbot commented 4 years ago

Awesome! And good call...I'm still a bit trigger happy on the refactoring, but it's coming together.

mdsumner commented 4 years ago

Just fyi, I'm back to catch up again after a mammoth CRAN release fest - the discussion here is good context to get my bearings.

paleolimbot commented 4 years ago

There's a couple new thoughts that haven't made it here yet!

mdsumner commented 4 years ago

I've never considered that, I thought that was kind of a no-go in Linux, you just have to suck it up - no other package does this, or do they? (I've been dreaming of a tinygdal with minimal drivers . . . and CGAL is header only now so ...)

paleolimbot commented 4 years ago

I think you can export a C API with C++ internals with runtime linking. This package doesn't use any new fancy GEOS stuff but it would be a bummer to make an geogeos package whose binary version functionality was entirely dependent on what version of GEOS CRAN decided to install that day.

SymbolixAU commented 4 years ago

Here's a quick sketch of what I think the package ecosystem could look like.

Screenshot 2020-04-21 at 8 11 35 am

Where the "C / C++ only" level is highly generic & templated, so any (appropriate) SEXP object will work in it. And they're independent of R attributes / classes.

Then at the "R / C / C++" level, this is where various R classes and attributes are added, such as sfg, sfc, geovctrs.

Then the "R code" is where people develop packages to make use of the various R-class spatial objects.


This is an intentionally simplistic diagram. In reality there will be lines going to & from many different packages.

mdsumner commented 4 years ago

YES

paleolimbot commented 4 years ago

I like it! I'm aware that I'm developing something like a devtools here...it'll need to be split up but it's helpful to develop the pieces together.

paleolimbot commented 4 years ago

Apparently this took me a few months, but geovctrs now doesn't use GEOS, anyway. Developing wk and s2 really helped scope this a bit better...no plotting or printing in geovctrs anymore, and no well-known geometry types (they all live in wk).

Still not sure any of this is organized properly, but it's hard to without writing it first!

paleolimbot commented 4 years ago

I've hit the end of my to-do list for geovctrs, although I'm sure I'll find more to fiddle with. Some things I'm thinking about related to this, in no particular order:

Just thinking aloud here!

dcooley commented 4 years ago

I think {geometries} is now pretty stable too. The main function for creating geometries is available through R and C++.

The high-level design is

R

df <- ggplot2::map_data("state")

## Using 1-index base
geom <- geometries::gm_geometries(df, c(5), c(1,2))
geom[[8]]
# [,1]     [,2]
# [1,] -77.13731 38.94394
# [2,] -77.06283 38.99551
# [3,] -77.01699 38.97259
# [4,] -76.93105 38.89238
# [5,] -77.05136 38.80643
# [6,] -77.05136 38.82935
# [7,] -77.06283 38.86373
# [8,] -77.07428 38.88664
# [9,] -77.09720 38.90956
# [10,] -77.13731 38.94394

And with a class

geom <- geometries::gm_geometries(df, c(5), c(1,2), "my_amazing_class")
geom[[8]]
# [,1]     [,2]
# [1,] -77.13731 38.94394
# [2,] -77.06283 38.99551
# [3,] -77.01699 38.97259
# [4,] -76.93105 38.89238
# [5,] -77.05136 38.80643
# [6,] -77.05136 38.82935
# [7,] -77.06283 38.86373
# [8,] -77.07428 38.88664
# [9,] -77.09720 38.90956
# [10,] -77.13731 38.94394
# attr(,"class")
# [1] "my_amazing_class"

C++

Rcpp::cppFunction(
  depends = "geometries"
  , includes = '#include "geometries/geometries.hpp"'
  , code = '
    SEXP make_geometry( SEXP x, SEXP id_cols, SEXP geometry_cols, Rcpp::Nullable< Rcpp::StringVector >  cls_attr ) {
      return geometries::make_geometries( x, id_cols, geometry_cols, cls_attr );
    }
  '
)

## using 0-index base
geom <- make_geometry( df, c(4L), c(0L,1L), NULL )
geom[[8]]
# [,1]     [,2]
# [1,] -77.13731 38.94394
# [2,] -77.06283 38.99551
# [3,] -77.01699 38.97259
# [4,] -76.93105 38.89238
# [5,] -77.05136 38.80643
# [6,] -77.05136 38.82935
# [7,] -77.06283 38.86373
# [8,] -77.07428 38.88664
# [9,] -77.09720 38.90956
# [10,] -77.13731 38.94394

And with a class

geom <- make_geometry( df, c(4L), c(0L,1L), "my_line_class" )
geom[[8]]
# [,1]     [,2]
# [1,] -77.13731 38.94394
# [2,] -77.06283 38.99551
# [3,] -77.01699 38.97259
# [4,] -76.93105 38.89238
# [5,] -77.05136 38.80643
# [6,] -77.05136 38.82935
# [7,] -77.06283 38.86373
# [8,] -77.07428 38.88664
# [9,] -77.09720 38.90956
# [10,] -77.13731 38.94394
# attr(,"class")
# [1] "my_line_class"

related - https://github.com/dcooley/geometries/issues/4

paleolimbot commented 4 years ago

Nice! I am going to revisit how to leverage this in wkutils next week to prep for the first CRAN release (which will be followed by geovctrs with any luck).

Any chance that the class arg could be a generic list of attributes? Like...list(class = "something", dim = "XYZ")?

dcooley commented 4 years ago

sure

mdsumner commented 3 years ago

I forgot about this discussion 😃