r-spatial / sf

Simple Features for R
https://r-spatial.github.io/sf/
Other
1.32k stars 293 forks source link

How to estimate shortest-path distance on a network and related issues #966

Closed agila5 closed 5 years ago

agila5 commented 5 years ago

I'm trying to code a spatial logistic model of car crashes in the municipality of Milan using R and, in particular, sf ( and some related packages), but I'm facing some problems. I'm not too sure about my approach and I think that the best way to explain the difficulties I'm finding is to provide a small and reproducible example (sorry for this long post). I decided to report this issue here (and not on SO or GIS stackexchange) since it's strictly related to sf, tell me if this is not the right place.

I have a dataset containing the location of all car accidents that occurred in Milan during 2015 and these are the coordinates of the first 5 events (out of 8000 car crashes approximately).

library(dplyr)
library(sf)
milan_car_crashes <- data.frame(
  ID = 1:5,
  X = c(1513037, 1513008, 1515473, 1514039, 1515748),
  Y = c(5034945, 5034750, 5036177, 5036820, 5037396)
) %>%
  # Transform as sf and set the CRS (Gauss-Boaga)
  st_as_sf(coords = c("X", "Y"), 
           crs = "+proj=tmerc +lat_0=0 +lon_0=9 +k=0.9996 +x_0=1500000 +y_0=0 +ellps=intl +units=m +no_defs")

Car crashes are an example of a spatial point pattern constrained to lie on a network of lines and, since I don't have an up to date shapefile of the highways of Milan, I decided to use the package osmdata to download it from OpenStreetMap. Looking at the wiki page of the Map Features of OSM, I chose to download the data of primary highways (while, on the real model, I will use also secondary, tertiary, unclassified, residential, motorways and trunk) as a simple feature and extract the LINESTRING geometry, setting the same CRS as my data.

library(osmdata)
milan_primary_highways <- opq("Milan, Italy") %>%
  add_osm_feature(key = "highway", value = "primary") %>%
  osmdata_sf() %>%
  trim_osmdata(bb_poly = getbb("Milan, Italy", format_out = "polygon")[[1]])

highways_sf <- milan_primary_highways$osm_lines %>%
  st_transform(crs = "+proj=tmerc +lat_0=0 +lon_0=9 +k=0.9996 +x_0=1500000 +y_0=0 +ellps=intl +units=m +no_defs")

Moreover, since car crashes represent a spatial point pattern that lie on a linear network, I think that the most appropriate way to measure the distance between any two points on the network is the shortest-path distance (and this is the biggest problem I'm facing as explained later). Firstly I want to check if the network of lines that I downloaded is connected or not (otherwise I don't think it's possible to measure a shortest-path distance), i.e. I have to verify that every highway is reachable from every other highway in the network. I coded this check using a combination of st_touches and hclust since I want to merge all streets touching each other (i.e. not only directly touching ones). This approach is really unsatisfactory since I can't use the sparse property of the matrix generated by st_touches (I don't know how to use hclust() on a sgbp object) therefore it's very memory consuming and it does not work if there are much more highways (i.e. I cannot use this approach if I consider also secondary, tertiary, unclassified and residentia highways) since the resulting matrix is too big. What's a better approach?

# Create a matrix of touching streets
touching_highways <- st_touches(highways_sf, sparse = FALSE)
# Merge all streets that touch each other converting all
highways_hclust <- hclust(as.dist(!touching_highways), method = "single")
# Cut the dendrogram at heigh 0.5 so that all touching 
# streets stay in the same group
highways_groups <- cutree(highways_hclust, h = 0.5)
# Result
table(highways_groups)
## highways_groups
##    1    2    3    4    5    6    7    8    9   10   11   12 
## 1444   21   18    8    4    5    8    2    3    5    2    2

There are 12 groups of connected highways and I filter out those streets not belonging to the big group otherwise I cannot calculate the shortest-path distance if two points belong to two distinct networks.

highways_sf <- highways_sf[highways_groups == 1, ]

The second step for the estimate of the shortest-path distance is the projection of the car crashes on the network and I performed this task using st_nearest_feature and st_nearest_point. This step is necessary since not every car crash belongs to the network due to approximation and rounding errors. Then again I'm not sure this is the right approach but I simply followed the examples reported on the help page of st_nearest_points. Is there a better alternative?

# Estimate the nearest point 
nearest_point <- milan_car_crashes %>%
  mutate(
    index_of_nearest_feature = st_nearest_feature(., highways_sf),
    nearest_feature = st_geometry(highways_sf[index_of_nearest_feature,]),
    nearest_point = purrr::pmap(
      list(geometry, nearest_feature),
      ~ st_nearest_points(.x, .y) %>% st_cast("POINT") %>% magrittr::extract2(2)
    )
  ) %>%
  pull(nearest_point) %>%
  st_sfc(crs = "+proj=tmerc +lat_0=0 +lon_0=9 +k=0.9996 +x_0=1500000 +y_0=0 +ellps=intl +units=m +no_defs")

I can now replace the geometry of the sf object with the new geometry of projected points.

milan_car_crashes <- milan_car_crashes %>% 
  st_drop_geometry() %>% 
  st_as_sf(., geometry = nearest_point) 

This is a plot of the result

The biggest and most important problem is that I have no clue on how to estimate the shortest-path distance between two points belonging to the network of highways. How can I do that using sf?

My sessionInfo.

sessionInfo()
## R version 3.5.2 (2018-12-20)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 17134)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=Italian_Italy.1252  LC_CTYPE=Italian_Italy.1252   
## [3] LC_MONETARY=Italian_Italy.1252 LC_NUMERIC=C                  
## [5] LC_TIME=Italian_Italy.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] bindrcpp_0.2.2 osmdata_0.0.9  sf_0.7-2       dplyr_0.7.8   
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.0       pillar_1.3.1     compiler_3.5.2   bindr_0.1.1     
##  [5] class_7.3-14     tools_3.5.2      digest_0.6.18    jsonlite_1.6    
##  [9] lubridate_1.7.4  evaluate_0.12    tibble_2.0.1     lattice_0.20-38 
## [13] pkgconfig_2.0.2  rlang_0.3.1      DBI_1.0.0        curl_3.3        
## [17] yaml_2.2.0       xfun_0.4         e1071_1.7-0.1    xml2_1.2.0      
## [21] stringr_1.3.1    httr_1.4.0       knitr_1.21       classInt_0.3-1  
## [25] grid_3.5.2       tidyselect_0.2.5 glue_1.3.0       R6_2.3.0        
## [29] rmarkdown_1.11   sp_1.3-1         purrr_0.3.0      magrittr_1.5    
## [33] htmltools_0.3.6  units_0.6-2      rvest_0.3.2      assertthat_0.2.0
## [37] stringi_1.2.4    crayon_1.3.4
edzer commented 5 years ago

Somewhat out of scope: sf is good to provide geometry representations for points and lines (nodes and edges in your street network) but not the infrastructure for representing the graph and doing shortest path. Most R packages use igraph for this. We did some similar experiments recently with @Robinlovelace , @loreabad6 and @luukvdmeer found here https://github.com/spnethack/spnethack; there is infrastructure in packages stplanr and osmar; older experiments using sp and igraph from me: https://github.com/edzer/spnetwork/

I think a relatively easy way to go with sf would be to equip tidygraph objects ("tidy" wrappers around igraph objects) with simple feature list-columns for nodes and edges, and work with those. I'm still hoping for a guest blog on r-spatial.org reporting about the spnethack experiments!

Robinlovelace commented 5 years ago

Shortest paths on route networks defined in sf objects can be done using the SpatialLinesNetwork function from the stplanr package, as shown in the reprex below. As Edzer says, we experimented with other ways of doing this and this seems that tidygraph/sf integration is possible and potentially useful here. More soon in the blog post (thanks for the reminder @edzer will look at it later today and push for changes for it to be 'camera ready', promise).

Note: may also be worth checking out the dodgr package which is also for routing.

Robinlovelace commented 5 years ago

Here's the reprex, building on your reproducible code (many thanks for that) showing how it can be done with stplanr:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(sf)
#> Linking to GEOS 3.7.0, GDAL 2.3.2, PROJ 5.2.0
library(osmdata)
#> Data (c) OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright

X = c(1513037, 1513008, 1515473, 1514039, 1515748)
Y = c(5034945, 5034750, 5036177, 5036820, 5037396)

milan_car_crashes <- data.frame(
  ID = 1:5,
  X = X,
  Y = Y
) %>%
  # Transform as sf and set the CRS (Gauss-Boaga)
  st_as_sf(coords = c("X", "Y"), 
           crs = "+proj=tmerc +lat_0=0 +lon_0=9 +k=0.9996 +x_0=1500000 +y_0=0 +ellps=intl +units=m +no_defs")

milan_primary_highways <- opq("Milan, Italy") %>%
  add_osm_feature(key = "highway", value = "primary") %>%
  osmdata_sf() %>%
  trim_osmdata(bb_poly = getbb("Milan, Italy", format_out = "polygon")[[1]])

highways_sf <- milan_primary_highways$osm_lines %>%
  st_transform(crs = "+proj=tmerc +lat_0=0 +lon_0=9 +k=0.9996 +x_0=1500000 +y_0=0 +ellps=intl +units=m +no_defs")

# Create a matrix of touching streets
touching_highways <- st_touches(highways_sf, sparse = FALSE)
# Merge all streets that touch each other converting all
highways_hclust <- hclust(as.dist(!touching_highways), method = "single")
# Cut the dendrogram at heigh 0.5 so that all touching 
  # streets stay in the same group
  highways_groups <- cutree(highways_hclust, h = 0.5)
# Result
table(highways_groups)
#> highways_groups
#>    1    2    3    4    5    6    7    8    9   10   11   12 
#> 1444   21   18    8    4    5    8    2    3    5    2    2

highways_sf <- highways_sf[highways_groups == 1, ]

# Estimate the nearest point 
nearest_point <- milan_car_crashes %>%
  mutate(
    index_of_nearest_feature = st_nearest_feature(., highways_sf),
    nearest_feature = st_geometry(highways_sf[index_of_nearest_feature,]),
    nearest_point = purrr::pmap(
      list(geometry, nearest_feature),
      ~ st_nearest_points(.x, .y) %>% st_cast("POINT") %>% magrittr::extract2(2)
    )
  ) %>%
  pull(nearest_point) %>%
  st_sfc(crs = "+proj=tmerc +lat_0=0 +lon_0=9 +k=0.9996 +x_0=1500000 +y_0=0 +ellps=intl +units=m +no_defs")

milan_car_crashes <- milan_car_crashes %>% 
  st_drop_geometry() %>% 
  st_as_sf(., geometry = nearest_point) 

plot(highways_sf$geometry)
plot(milan_car_crashes$geometry, add = TRUE)

library(stplanr)
rnet = SpatialLinesNetwork(highways_sf)

# calculate route from node 1 to 2:
simple_od_data = data.frame(start = 1, end = 2)
route_1_2 = sum_network_links(sln = rnet, routedata = simple_od_data)
plot(route_1_2, add = TRUE, col = "red", lwd = 3)
#> Warning in plot.sf(route_1_2, add = TRUE, col = "red", lwd = 3): ignoring
#> all but the first attribute

# from 5 random nodes to 5 random nodes
n = length(rnet@nb)
bigger_od_data = data.frame(
  start = c(1, mean(n), max(n)),
  end = c(sample(n, size = 3))
  )
route_3_3 = sum_network_links(sln = rnet, routedata = bigger_od_data)
plot(route_3_3$geometry, col = "blue", add = TRUE, lwd = route_3_3$count^2)


# from all locations to all locations
nodes_near = stplanr::find_network_nodes(sln = rnet, x = X, y = Y, maxdist = 2000)
plot(highways_sf$geometry)
plot(milan_car_crashes$geometry[c(1, 3)], add = TRUE)
crash_od_data = data.frame(start = nodes_near[1], end = nodes_near[3])
route_1_3_crashes = sum_network_links(sln = rnet, routedata = crash_od_data)
plot(route_1_3_crashes$geometry, col = "blue", add = TRUE, lwd = 5)

Created on 2019-01-28 by the reprex package (v0.2.1)

jsta commented 5 years ago

An option that hasn't been mentioned thus far is to round-trip your analysis to GRASS using rgrass7.

agila5 commented 5 years ago

Thank you very much for your responses, they are extremely helpful. I wasn’t aware of any of the packages and approaches that you mentioned and I’ll study them in greater detail in the next days.

Could you also help me with the first question of the first post (i.e. how to determine which cluster of highways are connected) ? I’m not sure if I’m simply using the wrong function or the approach is completely off base. I’m pretty sure that I cannot just skip this part of the analysis because I don’t see how it’s possible to estimate a distance between two points that don’t belong to the same connected network. Moreover I’m not sure how this problem is handled in these packages (maybe the distance is set as Inf?)

Robinlovelace commented 5 years ago

First impression: not sure, an extended reproducible example in which you show your attempts and reasoning building on the above code could help.

This issue is better suited for a question and answer system such as stackoverflow, the r-sig-geo email list or RStudio's popular discourse site. This issue tracker is designed to report bugs and request features for the sf package so, while I think it's a great place to share code and ideas, I agree with @edzer that this is slightly off topic and suggest that, if you don't have any specific issues with the package, you ask a question in an appropriate forum and close this particular issue. Linking to the new place will allow people to find it and, hopefully, help provide useful answers to your question.

agila5 commented 5 years ago

Ok, thank you very much for your help and your patience. It was a really useful post (at least for me).

I'll try to create a better example for the problem and post it on SO or somewhere else.

gert08 commented 5 years ago

Here's the reprex, building on your reproducible code (many thanks for that) showing how it can be done with stplanr:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(sf)
#> Linking to GEOS 3.7.0, GDAL 2.3.2, PROJ 5.2.0
library(osmdata)
#> Data (c) OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright

X = c(1513037, 1513008, 1515473, 1514039, 1515748)
Y = c(5034945, 5034750, 5036177, 5036820, 5037396)

milan_car_crashes <- data.frame(
  ID = 1:5,
  X = X,
  Y = Y
) %>%
  # Transform as sf and set the CRS (Gauss-Boaga)
  st_as_sf(coords = c("X", "Y"), 
           crs = "+proj=tmerc +lat_0=0 +lon_0=9 +k=0.9996 +x_0=1500000 +y_0=0 +ellps=intl +units=m +no_defs")

milan_primary_highways <- opq("Milan, Italy") %>%
  add_osm_feature(key = "highway", value = "primary") %>%
  osmdata_sf() %>%
  trim_osmdata(bb_poly = getbb("Milan, Italy", format_out = "polygon")[[1]])

highways_sf <- milan_primary_highways$osm_lines %>%
  st_transform(crs = "+proj=tmerc +lat_0=0 +lon_0=9 +k=0.9996 +x_0=1500000 +y_0=0 +ellps=intl +units=m +no_defs")

# Create a matrix of touching streets
touching_highways <- st_touches(highways_sf, sparse = FALSE)
# Merge all streets that touch each other converting all
highways_hclust <- hclust(as.dist(!touching_highways), method = "single")
# Cut the dendrogram at heigh 0.5 so that all touching 
  # streets stay in the same group
  highways_groups <- cutree(highways_hclust, h = 0.5)
# Result
table(highways_groups)
#> highways_groups
#>    1    2    3    4    5    6    7    8    9   10   11   12 
#> 1444   21   18    8    4    5    8    2    3    5    2    2

highways_sf <- highways_sf[highways_groups == 1, ]

# Estimate the nearest point 
nearest_point <- milan_car_crashes %>%
  mutate(
    index_of_nearest_feature = st_nearest_feature(., highways_sf),
    nearest_feature = st_geometry(highways_sf[index_of_nearest_feature,]),
    nearest_point = purrr::pmap(
      list(geometry, nearest_feature),
      ~ st_nearest_points(.x, .y) %>% st_cast("POINT") %>% magrittr::extract2(2)
    )
  ) %>%
  pull(nearest_point) %>%
  st_sfc(crs = "+proj=tmerc +lat_0=0 +lon_0=9 +k=0.9996 +x_0=1500000 +y_0=0 +ellps=intl +units=m +no_defs")

milan_car_crashes <- milan_car_crashes %>% 
  st_drop_geometry() %>% 
  st_as_sf(., geometry = nearest_point) 

plot(highways_sf$geometry)
plot(milan_car_crashes$geometry, add = TRUE)

library(stplanr)
rnet = SpatialLinesNetwork(highways_sf)

# calculate route from node 1 to 2:
simple_od_data = data.frame(start = 1, end = 2)
route_1_2 = sum_network_links(sln = rnet, routedata = simple_od_data)
plot(route_1_2, add = TRUE, col = "red", lwd = 3)
#> Warning in plot.sf(route_1_2, add = TRUE, col = "red", lwd = 3): ignoring
#> all but the first attribute

# from 5 random nodes to 5 random nodes
n = length(rnet@nb)
bigger_od_data = data.frame(
  start = c(1, mean(n), max(n)),
  end = c(sample(n, size = 3))
  )
route_3_3 = sum_network_links(sln = rnet, routedata = bigger_od_data)
plot(route_3_3$geometry, col = "blue", add = TRUE, lwd = route_3_3$count^2)

# from all locations to all locations
nodes_near = stplanr::find_network_nodes(sln = rnet, x = X, y = Y, maxdist = 2000)
plot(highways_sf$geometry)
plot(milan_car_crashes$geometry[c(1, 3)], add = TRUE)
crash_od_data = data.frame(start = nodes_near[1], end = nodes_near[3])
route_1_3_crashes = sum_network_links(sln = rnet, routedata = crash_od_data)
plot(route_1_3_crashes$geometry, col = "blue", add = TRUE, lwd = 5)

Created on 2019-01-28 by the reprex package (v0.2.1)

Hi! I tried to reproduce this code and I get an error.

milan_primary_highways <- opq("Milan, Italy") %>%

Error in rcpp_osmdata_sf(doc) : node can not be found

Also, with bb_poly I can just use the place name to obtain the trimmed polygons but if I have 4 coordinates thar cover a little more than the city, what would you sudgest?

agila5 commented 5 years ago

Hi. I'm sure that @Robinlovelace or someone else can answer you much better than me but I don't get any error with the following code.

library(osmdata)
#> Registered S3 method overwritten by 'rvest':
#>   method            from
#>   read_xml.response xml2
#> Data (c) OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright

milan_primary_highways <- opq("Milan, Italy") %>%
  add_osm_feature(key = "highway", value = "primary") %>%
  osmdata_sf() %>%
  trim_osmdata(bb_poly = getbb("Milan, Italy", format_out = "polygon")[[1]])
#> Loading required namespace: sf
milan_primary_highways
#> Object of class 'osmdata' with:
#>                  $bbox : 45.3867381,9.0408867,45.5358482,9.2781103
#>         $overpass_call : The call submitted to the overpass API
#>                  $meta : metadata including timestamp and version numbers
#>            $osm_points : 'sf' Simple Features Collection with 4242 points
#>             $osm_lines : 'sf' Simple Features Collection with 1662 linestrings
#>          $osm_polygons : 'sf' Simple Features Collection with 0 polygons
#>        $osm_multilines : NULL
#>     $osm_multipolygons : NULL

Created on 2019-07-04 by the reprex package (v0.2.1)

janemumei commented 4 years ago

Thanks for your reprex, @Robinlovelace. But, I run into an issue with these few lines.

rnet = SpatialLinesNetwork(highways_sf)
simple_od_data = data.frame(start = 1, end = 2)
route_1_2 = sum_network_links(sln = rnet, routedata = simple_od_data)

When highways_sf is an sf object, I get the error:

> rnet = SpatialLinesNetwork(highways_sf)
Error in validityMethod(object) : 
  nrow(object@sl) == length(igraph::E(object@g)) is not TRUE

When highways_sf is an sp object, I get an error running sum_network_links:

> route_1_2 = sum_network_links(sln = rnet, routedata = simple_od_data)
Error in stplanr::sum_network_links(sln = net, routedata = simple_od_data) : 
  SpatialLinesNetwork not supported. Use newer sfNetwork class instead

Any clue how to create an sfNetwork object or get sum_network_links to work on a SpatialLinesNetwork?

edzer commented 4 years ago

Please open/continue this issue at the right repo.

Robinlovelace commented 4 years ago

@jonoyuan this is the repo Edzer refers to: https://github.com/ropensci/stplanr/issues