hypertidy / geodist

Ultra lightweight, ultra fast calculation of geo distances
https://hypertidy.github.io/geodist/
Other
93 stars 7 forks source link

benchmark not meaningful; sf doesn't use Vincenty #11

Open edzer opened 6 years ago

edzer commented 6 years ago

In the benchmark on your README, you compare runtime with sf but not the outcomes, which are entirely different: as crs hasn't been set in the sf objects, it computes Euclidian distances. Setting it to 4326 it will compute ellipsoidal distances in m (use %>% st_sfc(crs = 4326)). This will also increase your speed gain to a factor 40.

The reason for the slowness of the library used by sf (PROJ) is that it computes ellipsoidal distances through the geographiclib routines by Karney (https://link.springer.com/content/pdf/10.1007/s00190-012-0578-z.pdf), which approximate numerically up to 8 byte double precision. The paper points out the differences of this method, and Vincenty's - a recommended read.

mpadge commented 6 years ago

Thanks. The README did have a note about the unfairness of the sf comparison, but that got deleted along the way. We will soon integrate a proper Vincenty ellipsoidal routine, (it's currently just spherical) following which I'll fix the benchmark. In the meantime, I'll reinsert the note about the comparison not being fair

mpadge commented 6 years ago

And thanks for the link - very useful

edzer commented 6 years ago

For the comparison: lwgeom can also branch on spherical Earth, which would be more fair to compare against your spherical distances. I'm primarily interested in both packages giving identical (and hopefully correct) results.

mpadge commented 6 years ago

Yeah, that would be great. At the moment we can only do exact comparison with geosphere, but obviously critically important to also get exact lwgeom comparison. I'll ping you for help when it's that far along - or feel free to PR anytime! We need, for example, to find the precise radius used by lwgeon, which for geosphere is (over) "accurate" to cm.

edzer commented 6 years ago

lwgeom doesn't use any, but gets them through the crs passed; parameter values are obtained by sf from GDAL, and printed by

> sf::st_crs(st_sfc(st_point(), crs = 4326), parameters = TRUE)
$SemiMajor
6378137 m

$SemiMinor
6356752 m

$InvFlattening
[1] 298.2572

$units_gdal
[1] "degree"

$IsVertical
[1] FALSE

$WktPretty
[1] "GEOGCS[\"WGS 84\",\n    DATUM[\"WGS_1984\",\n        SPHEROID[\"WGS 84\",6378137,298.257223563,\n            AUTHORITY[\"EPSG\",\"7030\"]],\n        AUTHORITY[\"EPSG\",\"6326\"]],\n    PRIMEM[\"Greenwich\",0,\n        AUTHORITY[\"EPSG\",\"8901\"]],\n    UNIT[\"degree\",0.0174532925199433,\n        AUTHORITY[\"EPSG\",\"9122\"]],\n    AUTHORITY[\"EPSG\",\"4326\"]]"

$Wkt
[1] "GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563,AUTHORITY[\"EPSG\",\"7030\"]],AUTHORITY[\"EPSG\",\"6326\"]],PRIMEM[\"Greenwich\",0,AUTHORITY[\"EPSG\",\"8901\"]],UNIT[\"degree\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AUTHORITY[\"EPSG\",\"4326\"]]"

$ud_unit
1 °
mpadge commented 6 years ago

@edzer Karney's C library now bundled in here allowing proper benchmarking. The other routines (Haversine, Vincenty) are so fast because they pre-compute trigonometric quantities for the pairwise comparisons, so only do n trig operations for n^2 comparisons. I should be able to do similar for Karney's routines, but re-writing them will be quite a chore. Anyway, output agrees with sf to within a few nm.

edzer commented 6 years ago

This is the approach that also raster took. The disadvantages I see are:

but I guess this is the cost of wanting to be "ultra-lightweight". By handling CRS and linking to PROJ, sf addresses both. As usual, the importance of all this depends on your application, but if you aim general purpose you may want to document this.

mpadge commented 6 years ago

Thanks @edzer. Those are important and valid points, but at the end, you're right that this will just have to be considered part of the cost of being "ultra-lightweight". I've ported off all geodesic constants to a single header file to at least allow relatively straightforward updating and extension to other datums. The issue will be closed with subsequent commits which will document these potential restrictions.