luukvdmeer / sfnetworks

Tidy Geospatial Networks in R
https://luukvdmeer.github.io/sfnetworks/
Other
347 stars 20 forks source link

Sf methods for sfnetwork objects #18

Closed loreabad6 closed 4 years ago

loreabad6 commented 4 years ago

Documenting what sf functions should have a method for an object of class sfnetwork:

Function Implemented Comment
geometric binary predicates yes Output: (sparse) matrix
st_crs yes *
st_set_crs yes *
st_bbox yes **
st_make_grid yes **
st_geometry yes **
st_transform no *
st_crop no *
st_join, st_filter no ?
geometric unary operations no ***, all these functions should be applied to sf objects BEFORE transforming to an sfnetwork
geometric operations no *, If an edge is cut after an st_intersection should a new node be created?
geometric measurements no ***, st_length only for edges, tidygraph::morph method?
combine functions no ?
tidyverse methods yes

* Should the function be applied to both edges and nodes? ** When edges are active, and are not LINESTRINGs, shows sf error: no geometry column detected. *** Not relevant for an sfnetwork object?

luukvdmeer commented 4 years ago

Several sf methods are implemented now. This works as follows: the sf functions are applied to the active element of the network (hence, either the nodes or edges). Internally, the active element is converted to an sf object, and then the sf function is applied. Depending on the function, the results are merged back into the network, and an sfnetwork object is returned, or simply the output of the sf function is returned. See the overview below.

In an sfnetwork object, the nodes are spatially explicit (hence, have a geometry list column) by definition. The edges can optionally be spatially explicit. Of course, the sf functions only work on the edges when they are spatially explicit. If not, sf will throw an error that no geometry list column could be found. That makes sense to me.

Although the sf functions are applied only to the active element of the network, that does not mean that they don't affect the other element in any case. For example, when filtering the nodes, the edges get filtered as well. See the overview below.

First, create an sf network object (as you see we still need a print method).

library(sfnetworks)
library(sf)
library(tidygraph)

net = as_sfnetwork(roxel)
# A tbl_graph: 701 nodes and 851 edges
#
# A directed multigraph with 14 components
#
# Node Data: 701 x 1 (active)
             geometry
          <POINT [°]>
1 (7.533722 51.95556)
2 (7.533461 51.95576)
3 (7.532442 51.95422)
4  (7.53209 51.95328)
5 (7.532709 51.95209)
6 (7.532869 51.95257)
# … with 695 more rows
#
# Edge Data: 851 x 5
   from    to name                  type                                                         geometry
  <int> <int> <fct>                 <fct>                                                <LINESTRING [°]>
1     1     2 Havixbecker Strasse   residential                    (7.533722 51.95556, 7.533461 51.95576)
2     3     4 Pienersallee          secondary     (7.532442 51.95422, 7.53236 51.95377, 7.53209 51.95328)
3     5     6 Schulte-Bernd-Strasse residential (7.532709 51.95209, 7.532823 51.95239, 7.532869 51.95257)
# … with 848 more rows

The active element can always be extracted with st_as_sf (if spatially explicit).

# Nodes are activated by default.
net %>% st_as_sf()
Simple feature collection with 701 features and 0 fields
geometry type:  POINT
dimension:      XY
bbox:           xmin: 7.522622 ymin: 51.94151 xmax: 7.546705 ymax: 51.9612
CRS:            EPSG:4326
# A tibble: 701 x 1
              geometry
           <POINT [°]>
 1 (7.533722 51.95556)
 2 (7.533461 51.95576)
 3 (7.532442 51.95422)
 4  (7.53209 51.95328)
 5 (7.532709 51.95209)
 6 (7.532869 51.95257)
 7 (7.540063 51.94468)
 8  (7.53822 51.94546)
 9  (7.537673 51.9475)
10 (7.537614 51.94562)
# … with 691 more rows
net %>% activate('edges') %>% st_as_sf()
Simple feature collection with 851 features and 4 fields
geometry type:  LINESTRING
dimension:      XY
bbox:           xmin: 7.522594 ymin: 51.94151 xmax: 7.546705 ymax: 51.9612
CRS:            EPSG:4326
# A tibble: 851 x 5
    from    to name             type                                                               geometry
   <int> <int> <fct>            <fct>                                                      <LINESTRING [°]>
 1     1     2 Havixbecker Str… resident…                            (7.533722 51.95556, 7.533461 51.95576)
 2     3     4 Pienersallee     secondary           (7.532442 51.95422, 7.53236 51.95377, 7.53209 51.95328)
 3     5     6 Schulte-Bernd-S… resident…         (7.532709 51.95209, 7.532823 51.95239, 7.532869 51.95257)
 4     7     8 NA               path      (7.540063 51.94468, 7.539696 51.94479, 7.539466 51.94491, 7.5392…
 5     9    10 Welsingheide     resident…                             (7.537673 51.9475, 7.537614 51.94562)
 6    11    12 NA               footway            (7.543791 51.94733, 7.54369 51.94686, 7.543751 51.94677)
 7    13    14 NA               footway                               (7.54012 51.94478, 7.539931 51.94514)
 8     8    10 NA               path      (7.53822 51.94546, 7.538131 51.94549, 7.538027 51.94552, 7.53761…
 9     7    15 NA               track     (7.540063 51.94468, 7.540338 51.94468, 7.540591 51.94474, 7.5412…
10    16    17 NA               track     (7.5424 51.94599, 7.54205 51.94629, 7.541967 51.94633, 7.541903 …
# … with 841 more rows

Functions that preserve the network structure

These are functions that merge the output of the sf function back into the network.

Filtering

The method for st_filter allows to filter nodes or edges by a spatial predicate. For example, we only want to keep those nodes that intersect with a certain sf object.

Important to notice here is that when filtering the nodes, edges that start or end at removed nodes, are removed as well. This behavior is not symmetric: when filtering the edges, nodes that are at the start or end of the removed edges, remain, even when they don't connect to any other edge. This is because in graph theory, edges can not exist without terminal nodes, while nodes can exist without being connected by edges.

In the sfnetwork method for st_filter(x,y), argument y can also be an sfnetwork object. Just as for object x, the active element of the network y will be internally be converted to an sf object, and used as such inside the sf function. That is, st_filter(net, net) is equal to st_filter(net, st_as_sf(net)). This behavior is the same for all methods with two input objects.

st_crop works similar, but allows to filter by a rectangle, like a bbox object (the st_bbox function also has an sfnetwork method, by the way).

net %>% st_filter(roxel[1:3, ], .predicate = st_intersects)
# A tbl_graph: 6 nodes and 3 edges
#
# A rooted forest with 3 trees
#
# Node Data: 6 x 1 (active)
             geometry
          <POINT [°]>
1 (7.533722 51.95556)
2 (7.533461 51.95576)
3 (7.532442 51.95422)
4  (7.53209 51.95328)
5 (7.532709 51.95209)
6 (7.532869 51.95257)
#
# Edge Data: 3 x 5
   from    to name                  type                                                         geometry
  <int> <int> <fct>                 <fct>                                                <LINESTRING [°]>
1     1     2 Havixbecker Strasse   residential                    (7.533722 51.95556, 7.533461 51.95576)
2     3     4 Pienersallee          secondary     (7.532442 51.95422, 7.53236 51.95377, 7.53209 51.95328)
3     5     6 Schulte-Bernd-Strasse residential (7.532709 51.95209, 7.532823 51.95239, 7.532869 51.95257)

Joining

The method for st_join allows to add extra information to nodes or edges, based on a spatial join. By default, this join is a left join, meaning that all nodes/edges of x remain. If changed to a inner join (by adding left = FALSE), nodes/edges of x get removed if they don't have a match with any row in y. Here, the same principle as above is valid: removing nodes also removes edges, but not vice versa.

The sfnetwork for st_join has an important restriction. When joining on the nodes, only joins where there is at most one match per feature are allowed. Allowing multiple matches is a problem when joining on the nodes. For example, if node 1 in x has two matches in y, this creates two rows for node 1. Firstly, this messes up the indices of the edges (if edge 1 used to go from node 1 to node 2, it now still goes from node 1 to node 2, but node 2 is suddenly the same node as node 1, and the 'old' node 2 is now node 3). Secondly, if we manage to solve that, should edge 1 go from the new node 1 to the new node 3, or from the new node 2 to the new node 3? In the future we might remove the restriction, but only after we decide on a good way to deal with multiple matches per node.

For the edges this problem does not exist, because the same nodes can be connected by multiple edges. Multiple matches will cause edges to be duplicated, which might not be so useful, but at least it does not break the rules of graph theory.

y = net %>% slice(1:3) %>% mutate(foo = c('a', 'b', 'c'))
net %>% st_join(y, join = st_intersects, left = FALSE)
# A tbl_graph: 3 nodes and 1 edges
#
# A rooted forest with 2 trees
#
# Node Data: 3 x 2 (active)
             geometry foo  
          <POINT [°]> <chr>
1 (7.533722 51.95556) a    
2 (7.533461 51.95576) b    
3 (7.532442 51.95422) c    
#
# Edge Data: 1 x 5
   from    to name                type                                      geometry
  <int> <int> <fct>               <fct>                             <LINESTRING [°]>
1     1     2 Havixbecker Strasse residential (7.533722 51.95556, 7.533461 51.95576)
net %>% activate('edges') %>% st_join(y, join = st_intersects, left = FALSE)
# A tbl_graph: 701 nodes and 11 edges
#
# A directed multigraph with 692 components
#
# Node Data: 701 x 1 (active)
             geometry
          <POINT [°]>
1 (7.533722 51.95556)
2 (7.533461 51.95576)
3 (7.532442 51.95422)
4  (7.53209 51.95328)
5 (7.532709 51.95209)
6 (7.532869 51.95257)
# … with 695 more rows
#
# Edge Data: 11 x 6
   from    to name                type                                                       geometry foo  
  <int> <int> <fct>               <fct>                                              <LINESTRING [°]> <chr>
1     1     2 Havixbecker Strasse residential                  (7.533722 51.95556, 7.533461 51.95576) a    
2     1     2 Havixbecker Strasse residential                  (7.533722 51.95556, 7.533461 51.95576) b    
3     3     4 Pienersallee        secondary   (7.532442 51.95422, 7.53236 51.95377, 7.53209 51.95328) c    
# … with 8 more rows
z = rbind(net %>% slice(1) %>% st_as_sf(), net %>% slice(1) %>% st_as_sf())
net %>% st_join(z)
Error in st_join.sfnetwork(., y) : 
 Multiple matches are not allowed when using st_join on an sfnetwork 

Setting or transforming CRS

Also the CRS functions technically are applied to the active element of the network. However, when changing the CRS of the active element (either by setting or transforming), the same operation is applied to the other element. This is because sfnetwork objects have the restriction that nodes and edges should always be in the same CRS.

The same applies when converting the coordinates with st_shift_longitude or st_wrap_dateline.

st_crs(net)
Coordinate Reference System:
  User input: EPSG:4326 
  wkt:
GEOGCS["WGS 84",
    DATUM["WGS_1984",
        SPHEROID["WGS 84",6378137,298.257223563,
            AUTHORITY["EPSG","7030"]],
        AUTHORITY["EPSG","6326"]],
    PRIMEM["Greenwich",0,
        AUTHORITY["EPSG","8901"]],
    UNIT["degree",0.0174532925199433,
        AUTHORITY["EPSG","9122"]],
    AUTHORITY["EPSG","4326"]]
net %>% st_transform(3035)
# A tbl_graph: 701 nodes and 851 edges
#
# A directed multigraph with 14 components
#
# Node Data: 701 x 1 (active)
           geometry
        <POINT [m]>
1 (4151491 3207923)
2 (4151474 3207946)
3 (4151398 3207777)
4 (4151370 3207673)
5 (4151408 3207539)
6 (4151421 3207592)
# … with 695 more rows
#
# Edge Data: 851 x 5
   from    to name                  type                                                   geometry
  <int> <int> <fct>                 <fct>                                          <LINESTRING [m]>
1     1     2 Havixbecker Strasse   residential                  (4151491 3207923, 4151474 3207946)
2     3     4 Pienersallee          secondary   (4151398 3207777, 4151390 3207727, 4151370 3207673)
3     5     6 Schulte-Bernd-Strasse residential (4151408 3207539, 4151417 3207573, 4151421 3207592)
# … with 848 more rows

Replacing geometry

Replacing the geometry of nodes or edges comes with restrictions. First, there is for now a general restriction for sfnetwork objects that nodes can only be points, and edges (if spatially explicit) only lines. This is to keep things simple at first. I am not sure if there are use cases where other geometry types as nodes or edges make sense, but we can always allow that further down the road.

Additionally:

You can drop geometry list columns by setting the geometry to NULL. Dropping the geometries of the nodes changes the class of the object to tbl_graph instead of sfnetwork. Dropping the geometries of edges keeps sfnetwork as class. The edges just won't be spatially explicit anymore.

st_geometry(net)
Geometry set for 701 features 
geometry type:  POINT
dimension:      XY
bbox:           xmin: 7.522622 ymin: 51.94151 xmax: 7.546705 ymax: 51.9612
CRS:            EPSG:4326
First 5 geometries:
POINT (7.533722 51.95556)
POINT (7.533461 51.95576)
POINT (7.532442 51.95422)
POINT (7.53209 51.95328)
POINT (7.532709 51.95209)
geom = st_geometry(net)
net %>% st_set_geometry(geom)
Error in `st_geometry<-.sfnetwork`(`*tmp*`, value = value) : 
  Node geometries cannot be replaced when edges are spatially explicit 
class(net %>% st_set_geometry(NULL))
[1] "tbl_graph" "igraph" 
geom = st_geometry(roxel)
net %>% activate('edges') %>% st_set_geometry(geom)
# A tbl_graph: 701 nodes and 851 edges
#
# A directed multigraph with 14 components
#
# Edge Data: 851 x 5 (active)
   from    to name             type                                                                geometry
  <int> <int> <fct>            <fct>                                                       <LINESTRING [°]>
1     1     2 Havixbecker Str… resident…                             (7.533722 51.95556, 7.533461 51.95576)
2     3     4 Pienersallee     secondary            (7.532442 51.95422, 7.53236 51.95377, 7.53209 51.95328)
3     5     6 Schulte-Bernd-S… resident…          (7.532709 51.95209, 7.532823 51.95239, 7.532869 51.95257)
4     7     8 NA               path      (7.540063 51.94468, 7.539696 51.94479, 7.539466 51.94491, 7.53923…
5     9    10 Welsingheide     resident…                              (7.537673 51.9475, 7.537614 51.94562)
6    11    12 NA               footway             (7.543791 51.94733, 7.54369 51.94686, 7.543751 51.94677)
# … with 845 more rows
#
# Node Data: 701 x 1
             geometry
          <POINT [°]>
1 (7.533722 51.95556)
2 (7.533461 51.95576)
3 (7.532442 51.95422)
# … with 698 more rows
geom = st_geometry(st_centroid(roxel))
net %>% activate('edges') %>% st_set_geometry(geom)
 Error in `st_geometry<-.sfnetwork`(`*tmp*`, value = value) : 
  Only geometries of type LINESTRING are allowed as edges 
geom = st_geometry(st_transform(roxel, 3035))
net %>% activate('edges') %>% st_set_geometry(geom)
 Error in `st_geometry<-.sfnetwork`(`*tmp*`, value = value) : 
  Edge geometries can only be replaced when the CRS doesn't change 
geom = st_geometry(st_reverse(roxel))
net %>% activate('edges') %>% st_set_geometry(geom)
Error in `st_geometry<-.sfnetwork`(`*tmp*`, value = value) : 
  Edge geometries can only be replaced when the endpoints don't change

Geometric unary operations

You can also replace the geometries by applying a geometric unary operation. However, only a limited set of these operations are supported, namely those which:

The operations that satisfy those conditions are st_segmentize, st_simplify and st_reverse. As said, it only makes sense to apply them to the edges, since the nodes are points and will not change in any of these operations.

Edit: It seems st_segmentize does not always keep the same endpoints. See here.

Calling st_reverse on the edges is special, since it will automatically also switch the to and from column of the edges.

We might want to implement morphers that allow you to create temporary alternative geometry representations like buffers and centroids. See here.

net %>% activate('edges') %>% st_reverse()
# A tbl_graph: 701 nodes and 851 edges
#
# A directed multigraph with 14 components
#
# Edge Data: 851 x 5 (active)
   from    to name             type                                                                geometry
  <int> <int> <fct>            <fct>                                                       <LINESTRING [°]>
1     2     1 Havixbecker Str… resident…                             (7.533461 51.95576, 7.533722 51.95556)
2     4     3 Pienersallee     secondary            (7.53209 51.95328, 7.53236 51.95377, 7.532442 51.95422)
3     6     5 Schulte-Bernd-S… resident…          (7.532869 51.95257, 7.532823 51.95239, 7.532709 51.95209)
4     8     7 NA               path      (7.53822 51.94546, 7.538353 51.94542, 7.538517 51.94538, 7.538645…
5    10     9 Welsingheide     resident…                              (7.537614 51.94562, 7.537673 51.9475)
6    12    11 NA               footway             (7.543751 51.94677, 7.54369 51.94686, 7.543791 51.94733)
# … with 845 more rows
#
# Node Data: 701 x 1
             geometry
          <POINT [°]>
1 (7.533722 51.95556)
2 (7.533461 51.95576)
3 (7.532442 51.95422)
# … with 698 more rows
Warning message:
In st_reverse.sfnetwork(.) :
  Reversing the edges exchanges the 'to' and 'from' columns

Functions that don't preserve the network structure

This are functions that return either vectors or (sparse) matrices. They can be called directly on an sfnetwork object and will be applied to the active element of the network. This skips you the step of first running st_as_sf(). That is not such a huge advantage. However, the sfnetwork methods mainly come in handy because you can now call them inside dplyr verbs like mutate, when applying those to an sfnetwork object.

Geometric binary predicates

All geometric binary predicates of sf are supported. You can use them directly:

y = net %>% slice(1:3)
net %>% st_intersects(y)
Sparse geometry binary predicate list of length 701, where the predicate was `intersects'
first 10 elements:
 1: 1
 2: 2
 3: 3
 4: (empty)
 5: (empty)
 6: (empty)
 7: (empty)
 8: (empty)
 9: (empty)
 10: (empty)

But also inside, for example, mutate calls:

net %>% mutate(int = lengths(st_intersects(., y)) > 0)
# A tbl_graph: 701 nodes and 851 edges
#
# A directed multigraph with 14 components
#
# Node Data: 701 x 2 (active)
             geometry int  
          <POINT [°]> <lgl>
1 (7.533722 51.95556) TRUE 
2 (7.533461 51.95576) TRUE 
3 (7.532442 51.95422) TRUE 
4  (7.53209 51.95328) FALSE
5 (7.532709 51.95209) FALSE
6 (7.532869 51.95257) FALSE
# … with 695 more rows
#
# Edge Data: 851 x 5
   from    to name                  type                                                         geometry
  <int> <int> <fct>                 <fct>                                                <LINESTRING [°]>
1     1     2 Havixbecker Strasse   residential                    (7.533722 51.95556, 7.533461 51.95576)
2     3     4 Pienersallee          secondary     (7.532442 51.95422, 7.53236 51.95377, 7.53209 51.95328)
3     5     6 Schulte-Bernd-Strasse residential (7.532709 51.95209, 7.532823 51.95239, 7.532869 51.95257)
# … with 848 more rows

Geometric measures

The geometric measures st_area, st_distance and st_length also have sfnetwork methods.

net %>% activate('edges') %>% mutate(length = st_length(.))
# A tbl_graph: 701 nodes and 851 edges
#
# A directed multigraph with 14 components
#
# Edge Data: 851 x 6 (active)
   from    to name           type                                                         geometry   length
  <int> <int> <fct>          <fct>                                                <LINESTRING [°]>      [m]
1     1     2 Havixbecker S… resident…                      (7.533722 51.95556, 7.533461 51.95576)  28.859…
2     3     4 Pienersallee   secondary     (7.532442 51.95422, 7.53236 51.95377, 7.53209 51.95328) 107.704…
3     5     6 Schulte-Bernd… resident…   (7.532709 51.95209, 7.532823 51.95239, 7.532869 51.95257)  54.362…
4     7     8 NA             path      (7.540063 51.94468, 7.539696 51.94479, 7.539466 51.94491, … 155.230…
5     9    10 Welsingheide   resident…                       (7.537673 51.9475, 7.537614 51.94562) 208.663…
6    11    12 NA             footway      (7.543791 51.94733, 7.54369 51.94686, 7.543751 51.94677)  63.023…
# … with 845 more rows
#
# Node Data: 701 x 1
             geometry
          <POINT [°]>
1 (7.533722 51.95556)
2 (7.533461 51.95576)
3 (7.532442 51.95422)
# … with 698 more rows

Other functions that return vectors or matrices and have sfnetwork methods are dimension, simplicity, validity or is_empty queries, st_relate, st_nearest_feature, st_coordinates, st_is and st_geometry_type.

Tidyverse methods

Because internally the nodes and edges are handles as being sf objects, automatically the sf methods of tidyverse functions are used when applied to an sfnetwork object. For example, geometry is sticky when applying select to an sfnetwork object (the to and from column are too, by the way. This comes from tidygraph).

net %>% activate('edges') %>% select(type)
# A tbl_graph: 701 nodes and 851 edges
#
# A directed multigraph with 14 components
#
# Edge Data: 851 x 4 (active)
   from    to type                                                                                 geometry
  <int> <int> <fct>                                                                        <LINESTRING [°]>
1     1     2 residential                                            (7.533722 51.95556, 7.533461 51.95576)
2     3     4 secondary                             (7.532442 51.95422, 7.53236 51.95377, 7.53209 51.95328)
3     5     6 residential                         (7.532709 51.95209, 7.532823 51.95239, 7.532869 51.95257)
4     7     8 path        (7.540063 51.94468, 7.539696 51.94479, 7.539466 51.94491, 7.539237 51.94502, 7.5…
5     9    10 residential                                             (7.537673 51.9475, 7.537614 51.94562)
6    11    12 footway                              (7.543791 51.94733, 7.54369 51.94686, 7.543751 51.94677)
# … with 845 more rows
#
# Node Data: 701 x 1
             geometry
          <POINT [°]>
1 (7.533722 51.95556)
2 (7.533461 51.95576)
3 (7.532442 51.95422)
# … with 698 more rows

Functions that are not supported

Not all sf functions have sfnetwork methods.

These functions cannot be directly applied to an sfnetwork object, but of course you can always escape the network structure, do some work, and merge changes back in if possible. For example, when I want to know the area of the voronoi polygon of each node.

y = net %>%
  st_transform(3035) %>%
  st_as_sf() %>%
  mutate(area_of_influence = st_area(st_collection_extract(st_voronoi(do.call(c, st_geometry(.))))))

net %>%
  st_transform(3035) %>%
  st_join(y) %>%
  st_transform(4326)
# A tbl_graph: 701 nodes and 851 edges
#
# A directed multigraph with 14 components
#
# Node Data: 701 x 2 (active)
             geometry area_of_influence
          <POINT [°]>             <dbl>
1 (7.533722 51.95556)          1837540.
2 (7.533461 51.95576)          2563698.
3 (7.532442 51.95422)            36462.
4  (7.53209 51.95328)           359624.
5 (7.532709 51.95209)            12954.
6 (7.532868 51.95257)            11089.
# … with 695 more rows
#
# Edge Data: 851 x 5
   from    to name                  type                                                         geometry
  <int> <int> <fct>                 <fct>                                                <LINESTRING [°]>
1     1     2 Havixbecker Strasse   residential                    (7.533722 51.95556, 7.533461 51.95576)
2     3     4 Pienersallee          secondary     (7.532442 51.95422, 7.53236 51.95377, 7.53209 51.95328)
3     5     6 Schulte-Bernd-Strasse residential (7.532709 51.95209, 7.532823 51.95239, 7.532868 51.95257)
# … with 848 more rows
Robinlovelace commented 4 years ago

Excellent work @luukvdmeer, it seems much of the functionality is there already. Do you think it's worth making some of your detailed post above part of a vignette?

luukvdmeer commented 4 years ago

Yes, I am planning on that. I do need to do some more work on method for specific tidygraph verbs before everything works smoothly.

luukvdmeer commented 4 years ago

Merged to master!