Open njtierney opened 5 years ago
Actually, once #50 has been implemented, this can be mostly implement in prototype-form as a single-line replacement of geodist
with dodgr::dists
, although I think a better option would be to allow users to submit a custom distance matrix to max_coverage
. Only real change to the API would be this one additional parameter with default of NULL
reverting to current/geodist behaviour. You'd then only need to illustrate in the examples how street network distances could be calculated and used.
There is of course the issue of initial selecting the "user_not_covered" locations, which is also distance-based, and so should also use the submitted matrix, but that would be straightforward. Similar options would then have to offered in other places, like nearest
.
The code needed to get a substitute distance matrix would be something like this:
net <- dodgr_streetnet_sc ("york, england", quiet = FALSE) %>%
weight_streetnet (wt_profile = "foot")
from <- match_points_to_graph (net, user_not_covered [, c ("long_to", "lat_to")])
to <- match_points_to_graph (net, proposed_facility [, c ("long", "lat")])
d <- dodgr_dists (net, from = from, to = to)
but that could be customized for usage here in some way if need be.
here's a reprex for some timing:
library (maxcovr)
library (dodgr)
suppressMessages (library (dplyr))
existing_facility <- york_selected <- york %>% filter(grade == "I")
proposed_facility <- york_unselected <- york %>% filter(grade != "I")
user <- york_crime
distance_cutoff <- 100
n <- nearest (york_selected, york_crime)
user_not_covered <- n [n$distance > distance_cutoff, 1:15]
tdiff <- function (t1, t2) formatC (t2 - t1, format = "f", digits = 1)
t0 <- proc.time ()
net <- dodgr_streetnet_sc ("york, england", quiet = FALSE) %>%
weight_streetnet (wt_profile = "foot")
#> Issuing query to Overpass API ...
#> Rate limit: 2
#> Query complete!
#> converting OSM data to sc format
#> Loading required namespace: geodist
t1 <- proc.time ()
message (rep ("*", 20), " elapsed time = ", tdiff (t0 [3], t1 [3]), " s")
#> ******************** elapsed time = 11.1 s
from <- match_points_to_graph (net, user_not_covered [, c ("long_to", "lat_to")])
#> First argument to match_pts_to_graph should be result of dodgr_vertices;
#> presuming you've submitted the network itself and will now try extracting the vertices
to <- match_points_to_graph (net, proposed_facility [, c ("long", "lat")])
#> First argument to match_pts_to_graph should be result of dodgr_vertices;
#> presuming you've submitted the network itself and will now try extracting the vertices
t2 <- proc.time ()
message (rep ("*", 20), " (that bit took ", tdiff (t1 [3], t2 [3]), " s)")
#> ******************** (that bit took 14.9 s)
d <- dodgr_dists (net, from = from, to = to)
t3 <- proc.time ()
message (rep ("*", 20), " distance calculation took ", tdiff (t2 [3], t3 [3]), " s; ",
"in total that all took ", tdiff (t0 [3], t3 [3]), " s")
#> ******************** distance calculation took 7.9 s; in total that all took 33.9 s
dref <- geodist::geodist (user_not_covered, proposed_facility)
dref <- data.frame ("dodgr" = as.numeric (d),
"geodist" = as.numeric (dref))
dref <- dref [which (!is.na (rowSums (dref))), ]
library (ggplot2)
#> Registered S3 methods overwritten by 'ggplot2':
#> method from
#> [.quosures rlang
#> c.quosures rlang
#> print.quosures rlang
theme_set (theme_minimal ())
ggplot (dref, aes (geodist, dodgr)) +
geom_hex () +
geom_abline (aes (intercept = 0, slope = 1), colour = "red")
Created on 2019-06-05 by the reprex package (v0.3.0)
so the whole thing for York and 1,475 * 2,873 = 4,237,675 distances only took just over 30 s. (Half of that was matching the points to the graph, and that shouldn't take so long, so I'll sniff around a bit there and speed that up.)
only point of all this really is to show that
Edit: Current slowness is due to this dodgr
issue, which will be easy to solve, and should enable above example to return 5 million street distances from scratch in under 20s.
Wow this is great!
In particular, the point that:
- straight-line distances are quite a poor substitute for street network distances; and
- relationships between the two are quite non-linear, and so you'll get propagating effects of non-linear errors in your results through using straight line distances.
Are so important, this would be a great figure in our paper :)
Extra points from Monash convo:
This might involve some changing of the way that the user interacts with the API.
Currently, the distances are calculated:
maxcovr
using long and lat columnsSo, it might be the case that a more custom/stripped down version of the
maxcovr
functions (max_coverage
) work where you provide the distances.See also #64, #50, and #51
cc @mpadge