Closed rrmn closed 5 years ago
Hi Roman, thank you, but distGeo is vectorized, Have you used it like below?
library(geosphere)
f_geo <- function(m) {
distGeo(m[-nrow(m),], m[-1,])
}
n <- 1000000
m <- cbind(x = runif(n, -180, 180), y = runif(n, -60, 60))
system.time(f_geo(m))
# user system elapsed
#1 1.92 0.04 1.95
or the simplest (where the above is done automatically)
d <- distGeo(m)
There is a little overhead in distGeo mainly having to do with recycling that you may not need, and could skip, but that part is probably very fast.
HA! That's pretty cool! Thank you a lot. If you would want to add this to a dataframe / tibble it would then be something like
m$dist <- c(0, distGeo(m))
Right?
And, to put a cherry on top, would you have an idea how to integrate this into a tidyverse pipe — maybe even with named columns?
If would be
m$dist <- distGeo(m)
I am not tidy-versed so I cannot help you there.
The resulting vector is of length = nrow(m)-1
, therefore you cannot straight up assign it to a column in a data frame, can you? (Also it does not work with nrow(m) <= 2
, but that's probably a tiny limitation)
Ah, yes, I am using the github version which returns a vector of length = nrow(m) and catches zero or 1 row cases. R
On Sat, May 25, 2019 at 12:59 PM Roman Abashin notifications@github.com wrote:
The resulting vector is of length = nrow(m)-1, therefore you cannot straight up assign it to a column in a data frame, can you? (Also it does not work with nrow(m) <= 2, but that's probably a tiny limitation)
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rspatial/geosphere/issues/2?email_source=notifications&email_token=ACXTCNPDP47FWFMICPM2GFTPXGLDFA5CNFSM4HPUC3CKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWHYH2A#issuecomment-495944680, or mute the thread https://github.com/notifications/unsubscribe-auth/ACXTCNPSXKOGDDHSYZLRKV3PXGLDFANCNFSM4HPUC3CA .
Hi Robert,
I'm working with large volumes of data where I calculate geodesic distances between spatial points, usually distances from one observation (= row) to the next. Currently, the datasets amounts to ~500 million rows each.
For this type of problem, most threads on Stackoverflow recommend a
rowwise
(tidyverse) approach with one of geosphere's functions, or trying a simpler algorithm (e.g., by projecting the data).Due to the need of fast performance (i.e., minutes, not days) and accurate calculation (i.e., at least geodesic) I made a quick & dirty vectorized implementation of your fantastic
distGeo
function (the inverse problem, that is).https://github.com/RomanAbashin/distGeo_v
What do you think? Maybe it would make sense to think about incorporating something like this into geosphere? Or did I miss an approach that was already present?