njtierney / maxcovr

Tools in R to make it easier to solve the Maximal Coverage Location Problem
https://njtierney.github.io/maxcovr/
GNU General Public License v3.0
43 stars 12 forks source link

Add support for OSM dodgr distances #77

Open njtierney opened 5 years ago

njtierney commented 5 years ago

This might involve some changing of the way that the user interacts with the API.

Currently, the distances are calculated:

So, it might be the case that a more custom/stripped down version of the maxcovr functions (max_coverage) work where you provide the distances.

See also #64, #50, and #51

cc @mpadge

mpadge commented 5 years ago

Actually, once #50 has been implemented, this can be mostly implement in prototype-form as a single-line replacement of geodist with dodgr::dists, although I think a better option would be to allow users to submit a custom distance matrix to max_coverage. Only real change to the API would be this one additional parameter with default of NULL reverting to current/geodist behaviour. You'd then only need to illustrate in the examples how street network distances could be calculated and used.

There is of course the issue of initial selecting the "user_not_covered" locations, which is also distance-based, and so should also use the submitted matrix, but that would be straightforward. Similar options would then have to offered in other places, like nearest.

The code needed to get a substitute distance matrix would be something like this:

net <- dodgr_streetnet_sc ("york, england", quiet = FALSE) %>%
    weight_streetnet (wt_profile = "foot")
from <- match_points_to_graph (net, user_not_covered [, c ("long_to", "lat_to")])
to <- match_points_to_graph (net, proposed_facility [, c ("long", "lat")])
d <- dodgr_dists (net, from = from, to = to)

but that could be customized for usage here in some way if need be.


here's a reprex for some timing:

library (maxcovr)
library (dodgr)
suppressMessages (library (dplyr))

existing_facility <- york_selected <- york %>% filter(grade == "I")
proposed_facility <- york_unselected <- york %>% filter(grade != "I")
user <- york_crime 
distance_cutoff <- 100

n <- nearest (york_selected, york_crime)
user_not_covered <- n [n$distance > distance_cutoff, 1:15]

tdiff <- function (t1, t2) formatC (t2 - t1, format = "f", digits = 1)
t0 <- proc.time ()
net <- dodgr_streetnet_sc ("york, england", quiet = FALSE) %>%
    weight_streetnet (wt_profile = "foot")
#> Issuing query to Overpass API ...
#> Rate limit: 2
#> Query complete!
#> converting OSM data to sc format
#> Loading required namespace: geodist
t1 <- proc.time ()
message (rep ("*", 20), " elapsed time = ", tdiff (t0 [3], t1 [3]), " s")
#>  ******************** elapsed time = 11.1 s
from <- match_points_to_graph (net, user_not_covered [, c ("long_to", "lat_to")])
#> First argument to match_pts_to_graph should be result of dodgr_vertices;
#> presuming you've submitted the network itself and will now try extracting the vertices
to <- match_points_to_graph (net, proposed_facility [, c ("long", "lat")])
#> First argument to match_pts_to_graph should be result of dodgr_vertices;
#> presuming you've submitted the network itself and will now try extracting the vertices
t2 <- proc.time ()
message (rep ("*", 20), " (that bit took ", tdiff (t1 [3], t2 [3]), " s)")
#>  ******************** (that bit took 14.9 s)
d <- dodgr_dists (net, from = from, to = to)
t3 <- proc.time ()
message (rep ("*", 20), " distance calculation took ", tdiff (t2 [3], t3 [3]), " s; ",
         "in total that all took ", tdiff (t0 [3], t3 [3]), " s")
#> ******************** distance calculation took 7.9 s; in total that all took 33.9 s

dref <- geodist::geodist (user_not_covered, proposed_facility)
dref <- data.frame ("dodgr" = as.numeric (d),
                    "geodist" = as.numeric (dref))
dref <- dref [which (!is.na (rowSums (dref))), ]
library (ggplot2)
#> Registered S3 methods overwritten by 'ggplot2':
#>   method         from 
#>   [.quosures     rlang
#>   c.quosures     rlang
#>   print.quosures rlang
theme_set (theme_minimal ())
ggplot (dref, aes (geodist, dodgr)) +
    geom_hex () +
    geom_abline (aes (intercept = 0, slope = 1), colour = "red")

Created on 2019-06-05 by the reprex package (v0.3.0)

so the whole thing for York and 1,475 * 2,873 = 4,237,675 distances only took just over 30 s. (Half of that was matching the points to the graph, and that shouldn't take so long, so I'll sniff around a bit there and speed that up.)

summary

only point of all this really is to show that

  1. straight-line distances are quite a poor substitute for street network distances; and
  2. relationships between the two are quite non-linear, and so you'll get propagating effects of non-linear errors in your results through using straight line distances.

Edit: Current slowness is due to this dodgr issue, which will be easy to solve, and should enable above example to return 5 million street distances from scratch in under 20s.

njtierney commented 5 years ago

Wow this is great!

In particular, the point that:

  1. straight-line distances are quite a poor substitute for street network distances; and
  2. relationships between the two are quite non-linear, and so you'll get propagating effects of non-linear errors in your results through using straight line distances.

Are so important, this would be a great figure in our paper :)

mpadge commented 5 years ago

Extra points from Monash convo:

  1. Use geodist with cheap distance by default; if dmax>100km, then issue message stating that Haversine will be used.
  2. Put in man entry somewhere a note that other distances (such as geodesic) can be used by submitted a custom distance matrix