luukvdmeer / sfnetworks

Tidy Geospatial Networks in R
https://luukvdmeer.github.io/sfnetworks/
Other
347 stars 20 forks source link

to_spatial_smooth morpher + edge fields #156

Closed agila5 closed 2 years ago

agila5 commented 3 years ago

Will fix #124 (hopefully). For the moment, I will just show the new functionality.

Load packages

suppressPackageStartupMessages({
library(sf)
library(tidygraph)
library(sfnetworks)
library(tmap)
})

Create simulated data

fake_sf <- st_sf(
  data.frame(
    type1 = c("a", "b", "b", "c"),
    type2 = c("d", "e", "e", "f"),
    type3 = c("f", "g", "h", "m"),
    type4 = 1:4
  ),
  geometry = st_sfc(
    st_linestring(rbind(c(1, 0.5), c(1, 0))),
    st_linestring(rbind(c(1, 0), c(2, 0))),
    st_linestring(rbind(c(2, 0), c(2, 0.5))),
    st_linestring(rbind(c(1, 1), c(2, 1))),
    crs = 4326
  ),
  agr = "constant"
)

Plot

tm_shape(fake_sf, ext = 1.3) +
 tm_lines(lwd = 2, col = "type1") +
tm_shape(st_cast(fake_sf, "POINT")) +
  tm_dots(size = 0.3) +
tm_graticules(lines = FALSE)

and create the sfn object

fake_sfn <- as_sfnetwork(fake_sf, directed = FALSE)

I think the current behaviour of to_spatial_smooth morpher (wrt to the fields) could be summarised as follows: the fields of the merged segments are all set to NA, while the other fields are kept identical to their original values.

(current_transformation <- convert(fake_sfn, to_spatial_smooth, .clean = TRUE))
#> # A sfnetwork with 4 nodes and 2 edges
#> #
#> # CRS:  EPSG:4326 
#> #
#> # An unrooted forest with 2 trees with spatially explicit edges
#> #
#> Registered S3 method overwritten by 'cli':
#>   method     from         
#>   print.boxx spatstat.geom
#> # Node Data:     4 x 1 (active)
#> # Geometry type: POINT
#> # Dimension:     XY
#> # Bounding box:  xmin: 1 ymin: 0.5 xmax: 2 ymax: 1
#>      geometry
#>   <POINT [°]>
#> 1     (1 0.5)
#> 2     (2 0.5)
#> 3       (1 1)
#> 4       (2 1)
#> #
#> # Edge Data:     2 x 7
#> # Geometry type: LINESTRING
#> # Dimension:     XY
#> # Bounding box:  xmin: 1 ymin: 0 xmax: 2 ymax: 1
#>    from    to type1 type2 type3 type4                 geometry
#>   <int> <int> <chr> <chr> <chr> <int>         <LINESTRING [°]>
#> 1     3     4 c     f     m         4               (1 1, 2 1)
#> 2     1     2 <NA>  <NA>  <NA>     NA (1 0.5, 1 0, 2 0, 2 0.5)

Hence…

tm_shape(current_transformation %E>% st_as_sf(), ext = 1.4) +
  tm_lines(lwd = 2, col = "type1") +
tm_shape(current_transformation %N>% st_as_sf()) +
  tm_dots(size = 0.3) +
tm_graticules(lines = FALSE)

Now I want to modify the behaviour of to_spatial_smooth morpher according to one or more fields. In particular, now I say that a node is a pseudo-node if and only if it has one incoming and one outgoing edge (or simply two incident edges in the case of undirected networks), and all chosen attributes of these two edges are equal. Moreover, the merged edges should preserve the values of the fields specified in the extra_fields argument (since they are identical). So, for example,

(new_transformation <- convert(fake_sfn, to_spatial_smooth, .clean = TRUE, extra_fields = "type1"))
#> # A sfnetwork with 5 nodes and 3 edges
#> #
#> # CRS:  EPSG:4326 
#> #
#> # An unrooted forest with 2 trees with spatially explicit edges
#> #
#> # Node Data:     5 x 1 (active)
#> # Geometry type: POINT
#> # Dimension:     XY
#> # Bounding box:  xmin: 1 ymin: 0 xmax: 2 ymax: 1
#>      geometry
#>   <POINT [°]>
#> 1     (1 0.5)
#> 2       (1 0)
#> 3     (2 0.5)
#> 4       (1 1)
#> 5       (2 1)
#> #
#> # Edge Data:     3 x 7
#> # Geometry type: LINESTRING
#> # Dimension:     XY
#> # Bounding box:  xmin: 1 ymin: 0 xmax: 2 ymax: 1
#>    from    to type1 type2 type3 type4          geometry
#>   <int> <int> <chr> <chr> <chr> <int>  <LINESTRING [°]>
#> 1     1     2 a     d     f         1      (1 0.5, 1 0)
#> 2     4     5 c     f     m         4        (1 1, 2 1)
#> 3     2     3 b     <NA>  <NA>     NA (1 0, 2 0, 2 0.5)

We can notice that 1) there are 3 edges (instead of 2) since the morpher didn’t merge edges with different values of “type1” column; 2) the column “type1” has value “b” for the new edge since both merged edges have the value “b”.

Plot

tm_shape(new_transformation %E>% st_as_sf(), ext = 1.4) +
  tm_lines(lwd = 2, col = "type1") +
tm_shape(new_transformation %N>% st_as_sf()) +
  tm_dots(size = 0.3) +
tm_graticules(lines = FALSE)

The same code should also work with multiple fields.

convert(fake_sfn, to_spatial_smooth, .clean = TRUE, extra_fields = c("type1", "type2"))
#> # A sfnetwork with 5 nodes and 3 edges
#> #
#> # CRS:  EPSG:4326 
#> #
#> # An unrooted forest with 2 trees with spatially explicit edges
#> #
#> # Node Data:     5 x 1 (active)
#> # Geometry type: POINT
#> # Dimension:     XY
#> # Bounding box:  xmin: 1 ymin: 0 xmax: 2 ymax: 1
#>      geometry
#>   <POINT [°]>
#> 1     (1 0.5)
#> 2       (1 0)
#> 3     (2 0.5)
#> 4       (1 1)
#> 5       (2 1)
#> #
#> # Edge Data:     3 x 7
#> # Geometry type: LINESTRING
#> # Dimension:     XY
#> # Bounding box:  xmin: 1 ymin: 0 xmax: 2 ymax: 1
#>    from    to type1 type2 type3 type4          geometry
#>   <int> <int> <chr> <chr> <chr> <int>  <LINESTRING [°]>
#> 1     1     2 a     d     f         1      (1 0.5, 1 0)
#> 2     4     5 c     f     m         4        (1 1, 2 1)
#> 3     2     3 b     e     <NA>     NA (1 0, 2 0, 2 0.5)

If the combination of pseudo-nodes and extra_fields remove all pseudo nodes (i.e. no node satisfies the new definition), then the morpher should return the input data.

convert(fake_sfn, to_spatial_smooth, .clean = TRUE, extra_fields = c("type1", "type2", "type3"))
#> # A sfnetwork with 6 nodes and 4 edges
#> #
#> # CRS:  EPSG:4326 
#> #
#> # An unrooted forest with 2 trees with spatially explicit edges
#> #
#> # Node Data:     6 x 1 (active)
#> # Geometry type: POINT
#> # Dimension:     XY
#> # Bounding box:  xmin: 1 ymin: 0 xmax: 2 ymax: 1
#>      geometry
#>   <POINT [°]>
#> 1     (1 0.5)
#> 2       (1 0)
#> 3       (2 0)
#> 4     (2 0.5)
#> 5       (1 1)
#> 6       (2 1)
#> #
#> # Edge Data:     4 x 7
#> # Geometry type: LINESTRING
#> # Dimension:     XY
#> # Bounding box:  xmin: 1 ymin: 0 xmax: 2 ymax: 1
#>    from    to type1 type2 type3 type4         geometry
#>   <int> <int> <chr> <chr> <chr> <int> <LINESTRING [°]>
#> 1     1     2 a     d     f         1     (1 0.5, 1 0)
#> 2     2     3 b     e     g         2       (1 0, 2 0)
#> 3     3     4 b     e     h         3     (2 0, 2 0.5)
#> # ... with 1 more row

It returns an error when one or more fields do not exist in edges table:

convert(fake_sfn, to_spatial_smooth, .clean = TRUE, extra_fields = c("type1", "XXX"))
#> Error: One or more fields in extra_fields do not exist in edges graph

We can apply the same idea to roxel data.

(roxel_sfn <- as_sfnetwork(roxel))
#> # A sfnetwork with 701 nodes and 851 edges
#> #
#> # CRS:  EPSG:4326 
#> #
#> # A directed multigraph with 14 components with spatially explicit edges
#> #
#> # Node Data:     701 x 1 (active)
#> # Geometry type: POINT
#> # Dimension:     XY
#> # Bounding box:  xmin: 7.522622 ymin: 51.94151 xmax: 7.546705 ymax: 51.9612
#>              geometry
#>           <POINT [°]>
#> 1 (7.533722 51.95556)
#> 2 (7.533461 51.95576)
#> 3 (7.532442 51.95422)
#> 4  (7.53209 51.95328)
#> 5 (7.532709 51.95209)
#> 6 (7.532869 51.95257)
#> # ... with 695 more rows
#> #
#> # Edge Data:     851 x 5
#> # Geometry type: LINESTRING
#> # Dimension:     XY
#> # Bounding box:  xmin: 7.522594 ymin: 51.94151 xmax: 7.546705 ymax: 51.9612
#>    from    to name           type                                       geometry
#>   <int> <int> <chr>          <fct>                              <LINESTRING [°]>
#> 1     1     2 Havixbecker S~ residen~     (7.533722 51.95556, 7.533461 51.95576)
#> 2     3     4 Pienersallee   seconda~ (7.532442 51.95422, 7.53236 51.95377, 7.5~
#> 3     5     6 Schulte-Bernd~ residen~ (7.532709 51.95209, 7.532823 51.95239, 7.~
#> # ... with 848 more rows

Default behaviour

(roxel_sfn_smooth <- convert(roxel_sfn, to_spatial_smooth, .clean = TRUE))
#> # A sfnetwork with 652 nodes and 802 edges
#> #
#> # CRS:  EPSG:4326 
#> #
#> # A directed multigraph with 14 components with spatially explicit edges
#> #
#> # Node Data:     652 x 1 (active)
#> # Geometry type: POINT
#> # Dimension:     XY
#> # Bounding box:  xmin: 7.522622 ymin: 51.94152 xmax: 7.546705 ymax: 51.9612
#>              geometry
#>           <POINT [°]>
#> 1 (7.533722 51.95556)
#> 2 (7.533461 51.95576)
#> 3 (7.532442 51.95422)
#> 4  (7.53209 51.95328)
#> 5 (7.532709 51.95209)
#> 6 (7.532869 51.95257)
#> # ... with 646 more rows
#> #
#> # Edge Data:     802 x 5
#> # Geometry type: LINESTRING
#> # Dimension:     XY
#> # Bounding box:  xmin: 7.522594 ymin: 51.94151 xmax: 7.546705 ymax: 51.9612
#>    from    to name           type                                       geometry
#>   <int> <int> <chr>          <fct>                              <LINESTRING [°]>
#> 1     1     2 Havixbecker S~ residen~     (7.533722 51.95556, 7.533461 51.95576)
#> 2     3     4 Pienersallee   seconda~ (7.532442 51.95422, 7.53236 51.95377, 7.5~
#> 3     5     6 Schulte-Bernd~ residen~ (7.532709 51.95209, 7.532823 51.95239, 7.~
#> # ... with 799 more rows

Smoothing only when two incident edges have the same “type”:

(roxel_sfn_smooth_v2 <- convert(roxel_sfn, to_spatial_smooth, .clean = TRUE, extra_fields = "type"))
#> # A sfnetwork with 677 nodes and 827 edges
#> #
#> # CRS:  EPSG:4326 
#> #
#> # A directed multigraph with 14 components with spatially explicit edges
#> #
#> # Node Data:     677 x 1 (active)
#> # Geometry type: POINT
#> # Dimension:     XY
#> # Bounding box:  xmin: 7.522622 ymin: 51.94151 xmax: 7.546705 ymax: 51.9612
#>              geometry
#>           <POINT [°]>
#> 1 (7.533722 51.95556)
#> 2 (7.533461 51.95576)
#> 3 (7.532442 51.95422)
#> 4  (7.53209 51.95328)
#> 5 (7.532709 51.95209)
#> 6 (7.532869 51.95257)
#> # ... with 671 more rows
#> #
#> # Edge Data:     827 x 5
#> # Geometry type: LINESTRING
#> # Dimension:     XY
#> # Bounding box:  xmin: 7.522594 ymin: 51.94151 xmax: 7.546705 ymax: 51.9612
#>    from    to name           type                                       geometry
#>   <int> <int> <chr>          <fct>                              <LINESTRING [°]>
#> 1     1     2 Havixbecker S~ residen~     (7.533722 51.95556, 7.533461 51.95576)
#> 2     3     4 Pienersallee   seconda~ (7.532442 51.95422, 7.53236 51.95377, 7.5~
#> 3     5     6 Schulte-Bernd~ residen~ (7.532709 51.95209, 7.532823 51.95239, 7.~
#> # ... with 824 more rows

The second version of the smoother has more edges (since there are fewer pseudo nodes)

igraph::ecount(roxel_sfn_smooth_v2); igraph::ecount(roxel_sfn_smooth)
#> [1] 827
#> [1] 802

but there are no NA values in the “type” column of roxel_sfn_smooth_v2

table(roxel_sfn_smooth %E>% pull(type), useNA = "always")
#> 
#>     cycleway      footway         path   pedestrian  residential    secondary 
#>           34           60           73            1          373           51 
#>      service        track unclassified         <NA> 
#>          151            5           17           37
table(roxel_sfn_smooth_v2 %E>% pull(type), useNA = "always")
#> 
#>     cycleway      footway         path   pedestrian  residential    secondary 
#>           39           65           89            1          392           57 
#>      service        track unclassified         <NA> 
#>          158            5           21            0

The main problem is that the new code has a “nonnegligible” impact on larger networks. For example

andorra <- osmextract::oe_get("Andorra", quiet = TRUE)
andorra_sfn <- as_sfnetwork(andorra, directed = FALSE)

system.time({convert(andorra_sfn, to_spatial_smooth, .clean = TRUE)})
#>    user  system elapsed 
#>    2.76    0.05    2.86
system.time({convert(andorra_sfn, to_spatial_smooth, .clean = TRUE, extra_fields = "highway")})
#>    user  system elapsed 
#>    8.78    0.04    9.13

Most of the time is spent running igraph::incident\_edges (see here). Moreover, I don’t like the lapply approach running == for testing equality.

Created on 2021-05-15 by the reprex package (v2.0.0)

agila5 commented 3 years ago

Most of the time is spent running igraph::incident_edges (see here). Moreover, I don’t like the lapply approach running == for testing equality.

I found that most of the computing time in igraph::incident_edges is spent running the following code:

if (igraph_opt("return.vs.es")) {
  res <- lapply(res, function(x) create_es(graph, x + 1))
}

and from the help page of ?igraph::igraph_opt we can read that

"return.vs.es": "Whether functions that return a set or sequence of vertices/edges should return formal vertex/edge sequence objects."

We don't need that feature in that part of the code (since igraph::edge.attributes() can work with any integer input), so I (temporarily) set it to FALSE and restore the default option immediately after running the code.

The new approach is also slightly faster than the current code (probably because it needs to process fewer nodes):

suppressPackageStartupMessages({
  library(sf)
  library(tidygraph)
  library(sfnetworks)
})

andorra <- osmextract::oe_get("Andorra", quiet = TRUE)
andorra_sfn <- as_sfnetwork(andorra, directed = FALSE)

system.time({convert(andorra_sfn, to_spatial_smooth, .clean = TRUE)})
#>    user  system elapsed 
#>    3.92    0.14    4.23
system.time({convert(andorra_sfn, to_spatial_smooth, .clean = TRUE, extra_fields = "highway")})
#>    user  system elapsed 
#>    3.36    0.04    3.61

Created on 2021-05-20 by the reprex package (v2.0.0)

I'm not sure how to fix the other parts of the code (mainly the elementwise comparisons).

luukvdmeer commented 2 years ago

Hi @agila5 thanks for this! I included the edits in my updated to_spatial_smooth implementation. See the develop branch