vlarmet / cppRouting

Algorithms for Routing and Solving the Traffic Assignment Problem
106 stars 9 forks source link

Enable lon/lat coordinates? #11

Closed edwardlavender closed 1 year ago

edwardlavender commented 1 year ago

This package is brilliant & I use it all the time (mainly for calculating the shortest distances between locations). This is a feature request - it would be really useful if the get_distance_*() functions in the package could handle longitude and latitude coordinates. I appreciate that you can project lon/lat coordinates, but this can induce distortions which are undesirable & cause issues around the International Date Line. Ideally, it would be possible to supply a data frame with from/to/cost to makegraph(), where cost represents geodesic distances, with lon/lat coords supplied to the coords argument of the same function, and then the package (e.g., get_distance_matrix()) would correctly compute shortest distances between selected nodes. Is this possible?

vlarmet commented 1 year ago

Hi, Thank you for your nice feedback about cppRouting. Coordinates are used in two one-to-one algorithms, available in get_*_pair() functions : A and NBA (New Bidirectionnal A). Coordinates are used to compute an heuristic fonction (euclidean distance) to decrease search space of Dijkstra algorithm. So if cost are correct geodesic distances between nodes, the shorest path between 2 nodes must be correct since algorithms minimize the sum of distances. To avoid errors, I would not use A or NBA since heuristic is expressed in Euclidean distance. Let me kown if I don't understand correctly your question !

edwardlavender commented 1 year ago

Thanks for your prompt response! I think you are saying that lon/lat coordinates can be used as long as the cost is defined correctly as the geodesic distances between nodes? However, I have observed an issue with distances not correctly wrapping around the international date line (i.e., the distance between (-179, 0) and (179, 0) is much longer than expected). Perhaps I am misunderstanding something. Here is a reproducible example:

#### cppRouting version
packageVersion("cppRouting")
# > 3.1

#### Define example dataset 
# * We want to calculate shortest distances from a point 
# ... on a lon/lat RasterLayer to the cells of that RasterLayer.
# * Normally, I would do this if there are barriers in the landscape e.g., coastline, 
# ... but for simplicity we will just ignore that here. 
# * We will choose a point near the International Date Line
# ... to check distance calculations.
surface   <- raster::raster(res = 5, crs = "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs")
surface[] <- runif(raster::ncell(surface), 0, 1)
centroid  <- cbind(-179, 0)

#### Calculate Euclidean distances from point (for comparison)
reuc <- raster::distanceFromPoints(surface, centroid)
reuc <- raster::mask(reuc, surface)

#### Calculate shortest distances from point via cppRouting 
# Define point (node)
centroid_cell <- raster::cellFromXY(surface, centroid)
centroid_cpp  <- sprintf("%.f", centroid_cell)
# Define cells on raster
cells  <- raster::as.data.frame(surface)
cells  <- data.table::data.table(cell = seq_len(nrow(cells)), 
                                 euc = reuc[],
                                 value = cells[, 1])
cells[, cell_cpp := sprintf("%.f", cell)]
# Define distances between adjacent cells on raster
edges <- SpaDES.tools::adj(surface, cells = cells$cell, directions = 8)
edges <- data.table::as.data.table(edges)
edges[, cost := raster::pointDistance(raster::xyFromCell(surface, edges$from), 
                                      raster::xyFromCell(surface, edges$to), 
                                      lonlat = TRUE)]
edges[, from := sprintf("%.f", from)]
edges[, to := sprintf("%.f", to)]
# Make graph
cells[, x := raster::xFromCell(surface, cells$cell)]
cells[, y := raster::yFromCell(surface, cells$cell)]
coord <- data.frame(node = cells$cell_cpp, X = cells$x, Y = cells$y)
graph <- cppRouting::makegraph(df = edges, directed = TRUE, coords = coord)
# Compute shortest distances from centroid to all non NA cells
cells <- cells[!is.na(value), ]
clcp <- cppRouting::get_distance_matrix(
  Graph = graph,
  from = centroid_cpp,
  to = cells$cell_cpp)
cells[, clcp := as.numeric(clcp)[match(cells$cell_cpp, dimnames(clcp)[[2]])]]

#### Calculate shortest distances via gDistance (for comparison)
# Define a transition matrix where the ease of movement between cells is equal
tr <- gdistance::transition(surface, function(...) 1, directions = 8, symm = TRUE)
# Use gdistance::geoCorrection to account for lon/lat coords
tr <- gdistance::geoCorrection(tr, type = "c")
cells$glcp <- gdistance::costDistance(x = tr, fromCoords = centroid, 
                                      toCoords = as.matrix(cells[, c("x", "y")]))[1, ]
# Compare cppRouting and gDistance outputs
all.equal(cells$clcp, cells$glcp)

#### Convert euclidean/shortest distances to rasters for plotting 
reuc <- surface
reuc[cells$cell] <- cells$euc
rclcp <- surface
rclcp[cells$cell] <- cells$clcp
rglcp <- surface
rglcp[cells$cell] <- cells$glcp

#### Plot results
# Set plotting window
pp <- par(mfrow = c(2, 3), oma = c(2, 2, 2, 2), mar = c(2, 2, 2, 2))
# Plot lcp ~ euc (expect approx. straight line)
plot(cells$euc, cells$clcp); abline(0, 1, col = "red")
plot(cells$euc, cells$glcp); abline(0, 1, col = "red")
plot(0, type = "n", axes = FALSE, xlab = "", ylab = "")
# Plot Euclidean distances map
# ... This correctly shows the closeness between -179 and +179 (etc.)
raster::plot(reuc, main = "Euclidean")
points(centroid)
# Plot cppRouting shortest distances maps
# ... This does not seem to account for the international date line
# ... i.e., -179 and + 179 are far apart 
raster::plot(rclcp, main = "cppRouting")
points(centroid)
# Plot gDistance shortest distances maps
raster::plot(rglcp, main = "gDistance")
points(centroid)
par(pp)

By the way, in the code above, I am using sprintf before cppRouting functions to define nodes because I have sometimes experienced errors with 'nodes not in graph' when numbers are passed to functions. I think this comes from the use of as.character() in cppRouting functions, such as makegraph() (e.g., for a node that is 2e5, this gets converted to "2e+05" rather than "200000").

vlarmet commented 1 year ago

Well, it seems that there are some missing edges connecting -180 and 180 longitudes in cppRouting network. For example, node 1225 (-179, 0) should be connected to 1296 (179,0) since these cells are contiguous like in a cylinder, right ? If I set centroid <- cbind(0, 0), results are equals. For your information, get_distance_matrix don't use coords attribute.

edwardlavender commented 1 year ago

Thank you for your feedback! It looks like there is an issue with how I prepared the adjacency matrix via adj() in the above example then. To summarise, when it comes to cppRouting:

  1. In get_*_pair() functions, you can include lon/lat coordinates and the shortest distances will be calculated, if the cost is defined as the geodesic distance correctly (and there is not a 'missing' edge at -180/+180).
  2. In other functions, such as get_distance_matrix(), coordinates are not used, so you can't use lon/lat data in those functions.

Is that right?

Thanks again for your advice & for making this great package available.

vlarmet commented 1 year ago

If cost are correct distance, every function will work. Coordinates are only used in get_*_pair() function with A* and NBA algorithms. These algorithms will give you the same result but it is faster.