Open edzer opened 6 years ago
Thanks. The README did have a note about the unfairness of the sf comparison, but that got deleted along the way. We will soon integrate a proper Vincenty ellipsoidal routine, (it's currently just spherical) following which I'll fix the benchmark. In the meantime, I'll reinsert the note about the comparison not being fair
And thanks for the link - very useful
For the comparison: lwgeom
can also branch on spherical Earth, which would be more fair to compare against your spherical distances. I'm primarily interested in both packages giving identical (and hopefully correct) results.
Yeah, that would be great. At the moment we can only do exact comparison with geosphere
, but obviously critically important to also get exact lwgeom
comparison. I'll ping you for help when it's that far along - or feel free to PR anytime! We need, for example, to find the precise radius used by lwgeon
, which for geosphere
is (over) "accurate" to cm.
lwgeom
doesn't use any, but gets them through the crs passed; parameter values are obtained by sf
from GDAL, and printed by
> sf::st_crs(st_sfc(st_point(), crs = 4326), parameters = TRUE)
$SemiMajor
6378137 m
$SemiMinor
6356752 m
$InvFlattening
[1] 298.2572
$units_gdal
[1] "degree"
$IsVertical
[1] FALSE
$WktPretty
[1] "GEOGCS[\"WGS 84\",\n DATUM[\"WGS_1984\",\n SPHEROID[\"WGS 84\",6378137,298.257223563,\n AUTHORITY[\"EPSG\",\"7030\"]],\n AUTHORITY[\"EPSG\",\"6326\"]],\n PRIMEM[\"Greenwich\",0,\n AUTHORITY[\"EPSG\",\"8901\"]],\n UNIT[\"degree\",0.0174532925199433,\n AUTHORITY[\"EPSG\",\"9122\"]],\n AUTHORITY[\"EPSG\",\"4326\"]]"
$Wkt
[1] "GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563,AUTHORITY[\"EPSG\",\"7030\"]],AUTHORITY[\"EPSG\",\"6326\"]],PRIMEM[\"Greenwich\",0,AUTHORITY[\"EPSG\",\"8901\"]],UNIT[\"degree\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AUTHORITY[\"EPSG\",\"4326\"]]"
$ud_unit
1 °
@edzer Karney's C library now bundled in here allowing proper benchmarking. The other routines (Haversine, Vincenty) are so fast because they pre-compute trigonometric quantities for the pairwise comparisons, so only do n
trig operations for n^2
comparisons. I should be able to do similar for Karney's routines, but re-writing them will be quite a chore. Anyway, output agrees with sf
to within a few nm.
This is the approach that also raster took. The disadvantages I see are:
but I guess this is the cost of wanting to be "ultra-lightweight". By handling CRS and linking to PROJ, sf
addresses both. As usual, the importance of all this depends on your application, but if you aim general purpose you may want to document this.
Thanks @edzer. Those are important and valid points, but at the end, you're right that this will just have to be considered part of the cost of being "ultra-lightweight". I've ported off all geodesic constants to a single header file to at least allow relatively straightforward updating and extension to other datums. The issue will be closed with subsequent commits which will document these potential restrictions.
In the benchmark on your README, you compare runtime with sf but not the outcomes, which are entirely different: as
crs
hasn't been set in the sf objects, it computes Euclidian distances. Setting it to4326
it will compute ellipsoidal distances inm
(use%>% st_sfc(crs = 4326)
). This will also increase your speed gain to a factor 40.The reason for the slowness of the library used by sf (PROJ) is that it computes ellipsoidal distances through the geographiclib routines by Karney (https://link.springer.com/content/pdf/10.1007/s00190-012-0578-z.pdf), which approximate numerically up to 8 byte double precision. The paper points out the differences of this method, and Vincenty's - a recommended read.