luukvdmeer / sfnetworks

Tidy Geospatial Networks in R
https://luukvdmeer.github.io/sfnetworks/
Other
347 stars 20 forks source link

Intersection simplifier for sfnetworks #104

Closed idshklein closed 3 years ago

idshklein commented 3 years ago

Very often, osm networks (and networks in general)) contains edges that represent the different legs of a junction or roundabouts. When one comes to analyze a network, sometimes this level of detail is too much, as it fails to represent streets as a common person would define them. In order to solve this complexity, an intersection simplifying technique is commonly used. I know of intersection simplifiers in MATSim and OSMnx.

According to my understanding, the algorithm to simplify intersections is based on three main parts:

  1. Finding a cluster of points which are close enough to each other and their centroid (using dbscan)
  2. Figuring out, using network analysis, whether the points attached together (by filtering out a subgraph per cluster and making sure there is one component only), and filtering out those who are not
  3. Changing the geometry of the graph, by deleting nodes and edges related to the clusters, and attaching the disconnected edges to the centroid which represents the cluster (while also changing the edge geometry).

Here is a draft of parts 1 and 2:

library(tidyverse)
#> Warning: package 'tibble' was built under R version 4.0.3
#> Warning: package 'tidyr' was built under R version 4.0.3
#> Warning: package 'readr' was built under R version 4.0.3
#> Warning: package 'dplyr' was built under R version 4.0.3
library(sf)
#> Warning: package 'sf' was built under R version 4.0.3
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
library(tidygraph)
#> 
#> Attaching package: 'tidygraph'
#> The following object is masked from 'package:stats':
#> 
#>     filter
library(sfnetworks)
library(dbscan)
#> Warning: package 'dbscan' was built under R version 4.0.3
library(magrittr)
#> 
#> Attaching package: 'magrittr'
#> The following object is masked from 'package:purrr':
#> 
#>     set_names
#> The following object is masked from 'package:tidyr':
#> 
#>     extract
# PART 1
# transforming to projected in order to use meters with dbscan, filtering out non-roads
roxel_transformed <- st_transform(roxel,3035) %>% 
  filter(!type %in% c("path","footway","track","pedestrian","cycleway"))
# initial network
ggplot(roxel_transformed) + 
  geom_sf()

# smoothing to prevent unnecesary identification of nodes in clusters
G <- roxel_transformed %>% 
  as_sfnetwork() %>% 
  convert(to_spatial_smooth)
# performing dbscan on nodes
nodes <- G %>% 
  st_as_sf()
nodes$cluster <- nodes %>% 
  st_coordinates() %>% 
  dbscan(15,2) %>% 
  extract("cluster") %>% 
  extract2(1)
# finding centroid of each cluster
centroids <- nodes %>% 
  group_by(cluster) %>% 
  summarise() %>% 
  ungroup() %>% 
  st_centroid()
#> `summarise()` ungrouping output (override with `.groups` argument)
#> Warning in st_centroid.sf(.): st_centroid assumes attributes are constant over
#> geometries of x
# result
p <- ggplot(roxel_transformed) + 
  geom_sf() + 
  geom_sf(data = nodes %>% filter(cluster!= 0),color = "red") + 
  geom_sf(data = centroids%>% filter(cluster!= 0), color = "green")
p

# a roundabout
p + coord_sf(xlim = c(4151510,4151540),ylim = c(3207550,3207620))

#PART 2
# tmp table to join with graph nodes
to_join <- tibble(cluster = nodes$cluster) %>% 
  left_join(centroids)
#> Joining, by = "cluster"
# finding which clusters are fully connected, i.e. have only one component
G %>% 
  activate(nodes) %>% 
  mutate(cluster = to_join$cluster,
         centroid = to_join$geometry) %>% 
  filter(cluster!=0) %>% 
  group_by(cluster) %>% 
  ungroup() %>% 
  morph(to_split, cluster) %>% 
  mutate(components = graph_component_count()) %>% 
  unmorph() %>% 
  filter(components == 1)
#> Warning: The `add` argument of `group_by()` is deprecated as of dplyr 1.0.0.
#> Please use the `.add` argument instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_warnings()` to see where this warning was generated.
#> Subsetting by nodes
#> # A sfnetwork with 76 nodes and 54 edges
#> #
#> # CRS:  EPSG:3035 
#> #
#> # A directed multigraph with 24 components with spatially explicit edges
#> #
#> # Node Data:     76 x 4 (active)
#> # Geometry type: POINT
#> # Dimension:     XY
#> # Bounding box:  xmin: 4150707 ymin: 3206662 xmax: 4152036 ymax: 3208543
#>            geometry cluster          centroid components
#>         <POINT [m]>   <int>       <POINT [m]>      <dbl>
#> 1 (4151732 3207017)       1 (4151726 3207014)          1
#> 2 (4151977 3207543)       2 (4151978 3207543)          1
#> 3 (4151543 3207742)       3 (4151544 3207750)          1
#> 4 (4151643 3207125)       4 (4151639 3207125)          1
#> 5 (4150716 3207080)       5 (4150712 3207081)          1
#> 6 (4151069 3206690)       9 (4151073 3206687)          1
#> # ... with 70 more rows
#> #
#> # Edge Data:     54 x 4
#> # Geometry type: LINESTRING
#> # Dimension:     XY
#> # Bounding box:  xmin: 4150707 ymin: 3206662 xmax: 4152036 ymax: 3208543
#>    from    to .orig_data                                 geometry
#>   <int> <int> <list>                             <LINESTRING [m]>
#> 1     6     7 <tibble [1 x 6]> (4151069 3206690, 4151034 3206674)
#> 2     8     7 <tibble [1 x 6]> (4151039 3206662, 4151034 3206674)
#> 3     6     9 <tibble [1 x 6]> (4151069 3206690, 4151075 3206678)
#> # ... with 51 more rows

Created on 2020-12-21 by the reprex package (v0.3.0)

Due to the fact that I do not have a deep understanding of the bolts and cogs of tidygraph and sfnetworks, I don't know how to implement the third part. Therefore, 3 questions:

  1. Am I missing something?
  2. Does sfnetworks need a intersection simplifier?
  3. How does one implements the third part?
luukvdmeer commented 3 years ago

Good to see such a nice workflow already, using sfnetworks! Sounds like something to implement indeed. Will look into the details and the third part asap

luukvdmeer commented 3 years ago

I think with current functionalities, most notably the tidygraph::to_contracted() morpher, we already come close. Improvements are still needed. Some issues:

Adressing these issues in a specific to_spatial_contracted morpher would bring us a lot further!

Reprex:

suppressPackageStartupMessages({
  library(sfnetworks)
  library(sf)
  library(tidygraph)
  library(tidyverse)
  library(magrittr)
  library(dbscan)
})

G = roxel %>%
  st_transform(3035) %>%
  filter(!type %in% c("path", "footway", "track", "pedestrian", "cycleway")) %>%
  as_sfnetwork()

# By setting minPts to 1 we put every node in a cluster
# Even if the node is the only member of that cluster
clusters = dbscan(st_coordinates(G), eps = 15, minPts = 1)$cluster

G = G %>%
  mutate(cluster = clusters)
G
#> # A sfnetwork with 597 nodes and 641 edges
#> #
#> # CRS:  EPSG:3035 
#> #
#> # A directed multigraph with 16 components with spatially explicit edges
#> #
#> # Node Data:     597 x 2 (active)
#> # Geometry type: POINT
#> # Dimension:     XY
#> # Bounding box:  xmin: 4150707 ymin: 3206375 xmax: 4152367 ymax: 3208543
#>            geometry cluster
#>         <POINT [m]>   <int>
#> 1 (4151491 3207923)       1
#> 2 (4151474 3207946)       2
#> 3 (4151398 3207777)       3
#> 4 (4151370 3207673)       4
#> 5 (4151408 3207539)       5
#> 6 (4151421 3207592)       6
#> # … with 591 more rows
#> #
#> # Edge Data:     641 x 5
#> # Geometry type: LINESTRING
#> # Dimension:     XY
#> # Bounding box:  xmin: 4150707 ymin: 3206375 xmax: 4152367 ymax: 3208543
#>    from    to name            type                                      geometry
#>   <int> <int> <chr>           <fct>                             <LINESTRING [m]>
#> 1     1     2 Havixbecker St… resident…       (4151491 3207923, 4151474 3207946)
#> 2     3     4 Pienersallee    secondary (4151398 3207777, 4151390 3207727, 4151…
#> 3     5     6 Schulte-Bernd-… resident… (4151408 3207539, 4151417 3207573, 4151…
#> # … with 638 more rows

# We can now contract nodes using the to_contracted morpher
# We use the cluster column to group the nodes, and all groups are contracted to a single node
# Edges are updated accordingly such that connectivity is preserved
G_new = G %>%
  convert(to_contracted, cluster)

# When using the tidygraph morpher, the results needs some 'unpacking' to be useful
G_new
#> # A sfnetwork with 508 nodes and 553 edges
#> #
#> # CRS:  EPSG:3035 
#> #
#> # A directed simple graph with 16 components with spatially implicit edges
#> #
#> # Node Data:     508 x 4 (active)
#> # Geometry type: POINT
#> # Dimension:     XY
#> # Bounding box:  xmin: 4150716 ymin: 3206375 xmax: 4152367 ymax: 3208539
#>   cluster          geometry .orig_data       .tidygraph_node_index
#>     <int>       <POINT [m]> <list>           <list>               
#> 1       1 (4151491 3207923) <tibble [1 × 1]> <int [1]>            
#> 2       2 (4151474 3207946) <tibble [1 × 1]> <int [1]>            
#> 3       3 (4151398 3207777) <tibble [1 × 1]> <int [1]>            
#> 4       4 (4151370 3207673) <tibble [1 × 1]> <int [1]>            
#> 5       5 (4151408 3207539) <tibble [1 × 1]> <int [1]>            
#> 6       6 (4151421 3207592) <tibble [1 × 1]> <int [1]>            
#> # … with 502 more rows
#> #
#> # Edge Data: 553 x 4
#>    from    to .tidygraph_edge_index .orig_data      
#>   <int> <int> <list>                <list>          
#> 1     1     2 <int [1]>             <tibble [1 × 5]>
#> 2     1   472 <int [1]>             <tibble [1 × 5]>
#> 3     2    14 <int [1]>             <tibble [1 × 5]>
#> # … with 550 more rows

G_n = st_as_sf(G, "nodes")
G_e = st_as_sf(G, "edges")
G_new_n = st_as_sf(G_new, "nodes")
G_new_e = st_as_sf(do.call(rbind, pull(activate(G_new, "edges"), .orig_data)))

# Plot
p = ggplot() +
  geom_sf(data = G_e, lwd = 2, color = "darkgrey") +
  geom_sf(data = G_n, cex = 3) +
  geom_sf(data = G_new_e, color = "red") +
  geom_sf(data = G_new_n, color = "red")

p + coord_sf(xlim = c(4151510,4151540),ylim = c(3207550,3207620))

Created on 2020-12-24 by the reprex package (v0.3.0)

luukvdmeer commented 3 years ago

The spatial contraction is now implemented as a morpher in the new release, see this section in the morpher vignette for a general example, and this section in the cleaning vignette for an example focused specifically on simplifying intersections.

I prefer to stay with the contraction morpher and not implement something on top of that addressing only intersection simplification. The current structure gives a lot of freedom in choosing your own spatial clustering algorithm, and adding any other grouping variables you want to use.

The function is not super, super fast by the way. This is mainly because of st_centroid, which seems to be a rather expensive computation. On Roxel it now runs in +/- 1.5 seconds.