Open paleolimbot opened 4 years ago
I can't get this to "just work"...for another day.
Something like StanHeaders might work better for the C++ interface
For what it's worth I just got this working in my fork, so I think this approach is worth pursuing?
In this cppFunction()
I can 'depend' and 'include' the geom.h
, and call the c++ code directly
library(Rcpp)
cppFunction(
depends = 'geom'
, includes = '#include "geom.h"'
, code = '
Rcpp::IntegerVector rcpp_orientation_index(
Rcpp::NumericVector ax, NumericVector ay,
Rcpp::NumericVector bx, NumericVector by,
Rcpp::NumericVector px, NumericVector py
) {
return geom::cpp_orientation_index(ax, ay, bx, by, px, py);
}
'
)
## I don't actually know what this function is supposed to do, but this shows it's working and can be
## linked to by another package
rcpp_orientation_index(1,2,3,3,2,1)
-1
Classic triangle orientation test http://www.cs.tufts.edu/comp/163/OrientationTests.pdf
Yeah! That's what I wanted! I tried it briefly and it worked when I compiled it but failed the CMD check and I don't know why.
What would we do without you, @mdsumner...I too had no idea what that function did.
As a side note, I just moved all the coersion and geometry construction stuff to geovctrs...WKB->GEOS is really fast, so I figured this package should just read/write WKB (relying on geovctrs::as_geo_wkb()
to support other formats). I was hoping that it would reduce the C++ code needed for this package, but really the whole Provider
/Exporter
/Operator
stack has to be here too, so maybe supporting all formats at the C++ level is just as easy.
heh, I use it here, was trying to remember where - just in R because I'm obsessed with vectorization: https://github.com/hypertidy/silicate/blob/05860d470f9a2dad5fc25237dc130e2ecc37de87/R/triangles.R though I abs() away the orientation because I just wanted the area
Packing and unpacking WKB is one of those core things I wanted out of sf - and you've done it! We at least need some kind of guiding discussion document about what core tools we want, how we see they should work, and who's doing it already ;)
For sure! There's a few things I'd like to get sorted between this and geovctrs, mostly to see if they're possible, then I think it's ready for some more in depth discussion.
I just moved all the coersion and geometry construction stuff
I see there's some cross-over in the .cpp & .h files between the two libraries (and also probably with rgeos ?)
Would it be worth having one interface package which has all, and only the underlying .c / .cpp / .hpp files, then other libraries can call and build on top of that?
(for examples I have the rapidjsonr and rapidxmlr as interface-only libraries, with no R functions in them)
I have no idea how to do that (I barely know C++), but that would be ideal. I haven't gotten around to removing the functionality in this package that I moved over to geovctrs, but some of the infrastructure around reading and writing would be nice to have in both. Even better if that means we get the newest, shiniest GEOS on all platforms (Ubuntu 16.04 is stuck on GEOS 3.5 by default, which causes some problems).
I don't know if I fully understand the examples (my ignorance, not your code)...I see the header files in inst/
, but where are the .cpp
files?
but where are the .cpp files
There aren't any. The header files are both the declaration and definition of the functions.
Using this structure other R packages can link to the header files by specifying the LinkingTo in the DESCRIPTION, then all the C++ code is available.
If we take rapidjsonr
and jsonify
as an example
rapidjsonr
is a header-only C++ library, stored in /inst/include
jsonify
declares LinkingTo in the descriptionjsonify
can include the header files from rapidjsonr
in its own .cpp / .hpp files#ifndef R_JSONIFY_H
#define R_JSONIFY_H
#include <Rcpp.h>
/* // [[Rcpp::depends(rapidjsonr)]] */
#include "rapidjson/writer.h"
#include "rapidjson/stringbuffer.h"
#include "rapidjson/prettywriter.h"
#include "jsonify/to_json/api.hpp"
#include "jsonify/from_json/api.hpp"
#endif
Similarly, i've written all the jsonify
has headers, so if another R package wants to convert JSON <--> R, then can 'LinkTo' jsonify and they'll only have to all jsonify::api::to_json()
/ jsonify::api::from_json()
Rcpp is similar - it's (mostly) header-only, which means to use it in a package you only need to specify the LinkingTo:
, then you can use #include <Rcpp.h>
in your own package.
For reference, the rapidjson library is here. You can explore /include/rapidjson/
and see all the functions.
And while I'm at it, this is also my design philosophy with sfheaders - it's a C++ header-only R package. So other libraries can LinkTo this library and use all the underlying C++ code directly from their own C++ code, without having to go back and forth between R.
So if you had an Rcpp list of matrices you wanted converting into a sf_POLYGON
, you can call sfheaders
to do all the work, something like (completely untested and made-up code)
#include "sfheaders/sf/sf_polygon"
Rcpp::List lm( 2 );
lm[0] = Rcpp::NumericMatrix(2,2);
lm[1] = Rcpp::NumericMatrix(3,3);
Rcpp::DataFrame sf = sfheaders::sf::sf_polygon( lm );
Awesome! I appreciate the hand-holding. I'll take a stab at it and see how it goes!
I'm very keen on this, so happy to help!
I got held up on trying to make NA
, EMPTY, and infinite values to work, but the C++ is coming. I've put some thoughts down in one of the doc pages about what I was thinking when I wrote geovctrs and would love feedback from you both if you're up for it!
https://paleolimbot.github.io/geovctrs/reference/is_geovctr.html
I don't fully understand / follow those docs, but that's mainly because I haven't followed the concept of vctrs
, nor do I use tibbles
, so a lot of it went over my head.
My main requirement for a 'modular' spatial ecosystem is that all the underlying C++ code is exposed and accessible from other packages.
@paleolimbot looks good, I still haven't tried much but hoping to soon. I know the vctrs allows far better efficiency and extensibility, I think the headers-stuff can come later (I'd also like to do the same for vapour, PROJ, decido, RTriangle ...)
We're getting somewhere! I think the interface should live here instead of geovctrs? geovctrs is based on GEOS but doesn't have to be. I don't think there's a way to easily include the GEOS sources in an R package (or if there is, I'm not the one to do it).
Rcpp::sourceCpp(
code = '
#include "geovctrs/geos-operator.hpp"
// [[Rcpp::depends(geovctrs)]]
class BufferOp: public UnaryGeometryOperator {
GEOSGeometry* operateNext(GEOSContextHandle_t context, GEOSGeometry* geometry, size_t i) {
// width, segments per quandrant
return GEOSBuffer_r(context, geometry, 2, 30);
}
};
// [[Rcpp::export]]
SEXP buffer2(SEXP data, SEXP ptype) {
BufferOp op;
op.initProvider(data);
op.initExporter(ptype);
return op.operate();
}
'
)
buffer2(geovctrs::geo_wkt("POINT (30 10)"), geovctrs::geo_wkt())
#> <geovctrs_wkt[1]>
#> [1] POLYGON ((32 10, 31.99725906950915 9.89…
Created on 2020-04-09 by the reprex package (v0.3.0)
Sorry for the emotional roller coaster here...I'm keeping the tiny header-only library here and exporting C++ functions using Rcpp::interfaces(r, cpp) for most functions. The general theme is that each geo_*_()
function in R has a geovctrs_cpp_*()
equivalent. This will be use-at-your-own-risk and undocumented for probably a long time, but if or when either of you feel it needs something else, feel free to PR. If it needs to be 100% different that's cool too...having never used a C++ API, I don't know what makes a good one.
Using the exported C++ functions:
Rcpp::sourceCpp(
code = '
#include "geovctrs.h"
// [[Rcpp::depends(geovctrs)]]
// [[Rcpp::export]]
Rcpp::IntegerVector ncoords(SEXP data) {
return geovctrs::geovctrs_cpp_n_coordinates(data);
}
'
)
ncoords(geovctrs::geo_wkt("POINT (30 10)"))
#> [1] 1
Created on 2020-04-10 by the reprex package (v0.3.0)
Using the GeovctrsOperator
classes to do GEOS stuff without worrying about whether or not you have to call GEOSGeom_destory_r()
:
Rcpp::sourceCpp(
code = '
#include "geovctrs/operator.hpp"
// [[Rcpp::depends(geovctrs)]]
class BufferOp: public GeovctrsGeometryOperator {
GEOSGeometry* operateNext(GEOSContextHandle_t context, GEOSGeometry* geometry, size_t i) {
// width, segments per quandrant
return GEOSBuffer_r(context, geometry, 2, 30);
}
};
// [[Rcpp::export]]
SEXP buffer2(SEXP data, SEXP ptype) {
BufferOp op;
op.initProvider(data);
op.initExporter(ptype);
return op.operate();
}
'
)
buffer2(geovctrs::geo_wkt("POINT (30 10)"), geovctrs::geo_wkt())
#> <geovctrs_wkt[1]>
#> [1] POLYGON ((32 10, 31.99725906950915 9.89…
Created on 2020-04-10 by the reprex package (v0.3.0)
Using the GeovctrsGEOSHandler
if you really want to go low-level without worrying about destroying the GEOSContextHandle_t
:
Rcpp::sourceCpp(
code = '
#include "geovctrs/geos-handler.hpp"
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::depends(geovctrs)]]
// [[Rcpp::export]]
IntegerVector geos_orientation(NumericVector Ax, NumericVector Ay,
NumericVector Bx, NumericVector By,
NumericVector Px, NumericVector Py) {
// constructor and deleter take care of initializing
// and destroying the GEOS context (if I got it right...)
GeovctrsGEOSHandler handler;
IntegerVector out(Ax.size());
for (int i=0; i < Ax.size(); i++) {
out[i] = GEOSOrientationIndex_r(
handler.context,
Ax[i], Ay[i],
Bx[i], By[i],
Px[i], Py[i]
);
}
return out;
}
'
)
geos_orientation(0, 0, 10, 0, 10, 10)
#> [1] 1
geos_orientation(0, 0, 10, 0, 10, -10)
#> [1] -1
Created on 2020-04-10 by the reprex package (v0.3.0)
Excellent work! I (SymbolixAU) will happily contribute to the C++ API. I'll let your emotions calm down a bit first though and wait for you to stabalise your side of it, then I'll jump in if that's OK?
Awesome! And good call...I'm still a bit trigger happy on the refactoring, but it's coming together.
Just fyi, I'm back to catch up again after a mammoth CRAN release fest - the discussion here is good context to get my bearings.
There's a couple new thoughts that haven't made it here yet!
I've never considered that, I thought that was kind of a no-go in Linux, you just have to suck it up - no other package does this, or do they? (I've been dreaming of a tinygdal with minimal drivers . . . and CGAL is header only now so ...)
I think you can export a C API with C++ internals with runtime linking. This package doesn't use any new fancy GEOS stuff but it would be a bummer to make an geogeos package whose binary version functionality was entirely dependent on what version of GEOS CRAN decided to install that day.
Here's a quick sketch of what I think the package ecosystem could look like.
Where the "C / C++ only" level is highly generic & templated, so any (appropriate) SEXP object will work in it. And they're independent of R attributes / classes.
Then at the "R / C / C++" level, this is where various R classes and attributes are added, such as sfg
, sfc
, geovctrs
.
Then the "R code" is where people develop packages to make use of the various R-class spatial objects.
This is an intentionally simplistic diagram. In reality there will be lines going to & from many different packages.
YES
I like it! I'm aware that I'm developing something like a devtools
here...it'll need to be split up but it's helpful to develop the pieces together.
Apparently this took me a few months, but geovctrs now doesn't use GEOS, anyway. Developing wk and s2 really helped scope this a bit better...no plotting or printing in geovctrs anymore, and no well-known geometry types (they all live in wk).
Still not sure any of this is organized properly, but it's hard to without writing it first!
I've hit the end of my to-do list for geovctrs, although I'm sure I'll find more to fiddle with. Some things I'm thinking about related to this, in no particular order:
geo_xy()
or matrix()
will always be best for points, sfc_(MULTI)POLYGON
works really well for polygons. Everything else is somewhere inbetween, and wkb is almost always good enough. External pointers to C++ objects are also really efficient (e.g., in s2 and maybe a GEOS or Boost geometry wrapper in the future), especially when you can do lazy indexing. They are, of course, limited to whatever the C++ library thought was important (e.g., no M coord in GEOS).R_RegisterCCallable()
is probably sustainable only if the library already has one (e.g., it's somebody else's job to maintain backward compatibility). GEOS and PROJ will probably work well (if CRAN is willing to accept them, considering they're already installed on CRAN machines); something like s2 would be too much work. I haven't found much utility in C APIs exposed via // Rcpp::interfaces(cpp)
...it coerces the caller into depending on Rcpp, and for geometries, most operations can be phrased nicely through vector operations where calling through R isn't time-limiting.Just thinking aloud here!
I think {geometries}
is now pretty stable too. The main function for creating geometries is available through R and C++.
The high-level design is
StringVector
which will get added on to each individual geometrydf <- ggplot2::map_data("state")
## Using 1-index base
geom <- geometries::gm_geometries(df, c(5), c(1,2))
geom[[8]]
# [,1] [,2]
# [1,] -77.13731 38.94394
# [2,] -77.06283 38.99551
# [3,] -77.01699 38.97259
# [4,] -76.93105 38.89238
# [5,] -77.05136 38.80643
# [6,] -77.05136 38.82935
# [7,] -77.06283 38.86373
# [8,] -77.07428 38.88664
# [9,] -77.09720 38.90956
# [10,] -77.13731 38.94394
And with a class
geom <- geometries::gm_geometries(df, c(5), c(1,2), "my_amazing_class")
geom[[8]]
# [,1] [,2]
# [1,] -77.13731 38.94394
# [2,] -77.06283 38.99551
# [3,] -77.01699 38.97259
# [4,] -76.93105 38.89238
# [5,] -77.05136 38.80643
# [6,] -77.05136 38.82935
# [7,] -77.06283 38.86373
# [8,] -77.07428 38.88664
# [9,] -77.09720 38.90956
# [10,] -77.13731 38.94394
# attr(,"class")
# [1] "my_amazing_class"
Rcpp::cppFunction(
depends = "geometries"
, includes = '#include "geometries/geometries.hpp"'
, code = '
SEXP make_geometry( SEXP x, SEXP id_cols, SEXP geometry_cols, Rcpp::Nullable< Rcpp::StringVector > cls_attr ) {
return geometries::make_geometries( x, id_cols, geometry_cols, cls_attr );
}
'
)
## using 0-index base
geom <- make_geometry( df, c(4L), c(0L,1L), NULL )
geom[[8]]
# [,1] [,2]
# [1,] -77.13731 38.94394
# [2,] -77.06283 38.99551
# [3,] -77.01699 38.97259
# [4,] -76.93105 38.89238
# [5,] -77.05136 38.80643
# [6,] -77.05136 38.82935
# [7,] -77.06283 38.86373
# [8,] -77.07428 38.88664
# [9,] -77.09720 38.90956
# [10,] -77.13731 38.94394
And with a class
geom <- make_geometry( df, c(4L), c(0L,1L), "my_line_class" )
geom[[8]]
# [,1] [,2]
# [1,] -77.13731 38.94394
# [2,] -77.06283 38.99551
# [3,] -77.01699 38.97259
# [4,] -76.93105 38.89238
# [5,] -77.05136 38.80643
# [6,] -77.05136 38.82935
# [7,] -77.06283 38.86373
# [8,] -77.07428 38.88664
# [9,] -77.09720 38.90956
# [10,] -77.13731 38.94394
# attr(,"class")
# [1] "my_line_class"
Nice! I am going to revisit how to leverage this in wkutils next week to prep for the first CRAN release (which will be followed by geovctrs with any luck).
Any chance that the class
arg could be a generic list of attributes? Like...list(class = "something", dim = "XYZ")
?
I forgot about this discussion 😃
This is pretty much just adding
at the top of every file that contains
It would be nice to export some functions that work on
GEOSGeometry*
, but most exported functions can work on WKB vectors, for which conversion is fast (3 million/sec). I don't think there's any way to export the actual GEOS library or C++ classes (unfortunate, because exporting the Operator classes would make it really easy people to write their own operators).