Threshold parameter in the definition of an sfnetwork object

luukvdmeer / sfnetworks

Tidy Geospatial Networks in R

https://luukvdmeer.github.io/sfnetworks/

Other

347 stars 20 forks source link

Threshold parameter in the definition of an sfnetwork object #71

Closed agila5 closed 3 years ago

agila5 commented 4 years ago

Is your feature request related to a problem? Please describe. I would like creating a new parameter in as_sfnetwork() (and similar functions) that is used to specify a maximum threshold indicating the tolerance to use when checking for the nodes in a network. A similar parameter exists in stplanr::SpatialLinesNetwork but I'm not sure about the implementation.

The problem was introduced here: https://gis.stackexchange.com/questions/370640/how-to-connect-edges-in-a-network-even-if-they-dont-exactly-match-spatially

Reprex of the problem:

# packages
library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
library(sfnetworks)

# data
pts1 <-  matrix(1:4, 2)
ls1 <-  st_linestring(pts1) 
pts2 <-  matrix(31:34, ,2)
pts2[1,1] <- 1.00005 
pts2[1,2] <- 3.00005
ls2 <-  st_linestring(pts2)
ls1; ls2
#> LINESTRING (1 3, 2 4)
#> LINESTRING (1.00005 3.00005, 32 34)

obj <- st_sf(geometry = c(st_geometry(ls1), st_geometry(ls2)))
as_sfnetwork(obj)
#> # An sfnetwork with 4 nodes and 2 edges
#> #
#> # CRS:  NA 
#> #
#> # A rooted forest with 2 trees with spatially explicit edges
#> #
#> # Node Data:     4 x 1 (active)
#> # Geometry type: POINT
#> # Dimension:     XY
#> # Bounding box:  xmin: 1 ymin: 3 xmax: 32 ymax: 34
#>            geometry
#>             <POINT>
#> 1             (1 3)
#> 2             (2 4)
#> 3 (1.00005 3.00005)
#> 4           (32 34)
#> #
#> # Edge Data:     2 x 3
#> # Geometry type: LINESTRING
#> # Dimension:     XY
#> # Bounding box:  xmin: 1 ymin: 3 xmax: 32 ymax: 34
#>    from    to                 geometry
#>   <int> <int>             <LINESTRING>
#> 1     1     2               (1 3, 2 4)
#> 2     3     4 (1.00005 3.00005, 32 34)

# Here the nodes are distinct

st_precision(obj) <- 1
as_sfnetwork(obj)
#> # An sfnetwork with 4 nodes and 2 edges
#> #
#> # CRS:  NA 
#> #
#> # A rooted forest with 2 trees with spatially explicit edges
#> #
#> # Node Data:     4 x 1 (active)
#> # Geometry type: POINT
#> # Dimension:     XY
#> # Bounding box:  xmin: 1 ymin: 3 xmax: 32 ymax: 34
#>            geometry
#>             <POINT>
#> 1             (1 3)
#> 2             (2 4)
#> 3 (1.00005 3.00005)
#> 4           (32 34)
#> #
#> # Edge Data:     2 x 3
#> # Geometry type: LINESTRING
#> # Dimension:     XY
#> # Bounding box:  xmin: 1 ymin: 3 xmax: 32 ymax: 34
#>    from    to                 geometry
#>   <int> <int>             <LINESTRING>
#> 1     1     2               (1 3, 2 4)
#> 2     3     4 (1.00005 3.00005, 32 34)

^{Created on 2020-08-28 by the reprex package (v0.3.0)}

Describe the solution you'd like The nodes that are closer than a certain threshold (which defaults to 0) should be merged into a unique node.

luukvdmeer commented 4 years ago

So point (1,3) and point (1.0005, 3.0005) should be treated as being the same node in the network right? So these two points are in the network represented by a single node. To me that raises the question: what coordinates should this combined node have? (1,3), (1.0005, 3.0005), or something in between? I would suggest to pre-process the data accordingly, before creating a network. For example:

# packages
library(sf)
#> Linking to GEOS 3.7.1, GDAL 2.2.2, PROJ 4.9.2
library(sfnetworks)
library(magrittr)

# data
pts1 <-  matrix(1:4, 2)
ls1 <-  st_linestring(pts1) 
pts2 <-  matrix(31:34, ,2)
pts2[1,1] <- 1.00005 
pts2[1,2] <- 3.00005
ls2 <-  st_linestring(pts2)
ls1; ls2
#> LINESTRING (1 3, 2 4)
#> LINESTRING (1.00005 3.00005, 32 34)

obj <- st_sf(geometry = c(st_geometry(ls1), st_geometry(ls2)))

# round coordinates
st_geometry(obj) = st_geometry(obj) %>%
  lapply(function(x) round(x, 0)) %>%
  st_sfc(crs = st_crs(obj))

obj
#> Simple feature collection with 2 features and 0 fields
#> geometry type:  LINESTRING
#> dimension:      XY
#> bbox:           xmin: 1 ymin: 3 xmax: 32 ymax: 34
#> CRS:            NA
#>                  geometry
#> 1   LINESTRING (1 3, 2 4)
#> 2 LINESTRING (1 3, 32 34)

# create network
as_sfnetwork(obj)
#> # An sfnetwork with 3 nodes and 2 edges
#> #
#> # CRS:  NA 
#> #
#> # A rooted tree with spatially explicit edges
#> #
#> # Node Data:     3 x 1 (active)
#> # Geometry type: POINT
#> # Dimension:     XY
#> # Bounding box:  xmin: 1 ymin: 3 xmax: 32 ymax: 34
#>   geometry
#>    <POINT>
#> 1    (1 3)
#> 2    (2 4)
#> 3  (32 34)
#> #
#> # Edge Data:     2 x 3
#> # Geometry type: LINESTRING
#> # Dimension:     XY
#> # Bounding box:  xmin: 1 ymin: 3 xmax: 32 ymax: 34
#>    from    to     geometry
#>   <int> <int> <LINESTRING>
#> 1     1     2   (1 3, 2 4)
#> 2     1     3 (1 3, 32 34)

^{Created on 2020-08-30 by the reprex package (v0.3.0)}

luukvdmeer commented 4 years ago

Another option would be to keep the original nodes, but draw extra edges between those nodes that are within distance x from each other. A fast and dirty implementation, see below.

Note I use some internal sfnetworks functions, maybe we can export some of those. Note also that this fast implementation does not consider attributes, but only the geometry column, and will duplicate original edges when they are themselves of a distance < x. However, this is just to showcase ;) The idea is the same: first pre-process the data into the desired format, and only then create the network.

# packages
library(sf)
#> Linking to GEOS 3.7.1, GDAL 2.2.2, PROJ 4.9.2
library(sfnetworks)
library(magrittr)
library(purrr)
#> 
#> Attaching package: 'purrr'
#> The following object is masked from 'package:magrittr':
#> 
#>     set_names

# data
pts1 <-  matrix(1:4, 2)
ls1 <-  st_linestring(pts1) 
pts2 <-  matrix(31:34, ,2)
pts2[1,1] <- 1.00005 
pts2[1,2] <- 3.00005
ls2 <-  st_linestring(pts2)
obj <- st_sf(geometry = c(st_geometry(ls1), st_geometry(ls2)))

connect_close_nodes = function(x, threshold) {
  # get boundary points of the edges
  nodes = sfnetworks:::get_boundary_points(obj)
  # compute distance matrix with these nodes
  dist_mat = st_distance(nodes)
  # connect those boundary points that are closer than the threshold distance
  connections = which(dist_mat < threshold, arr.ind = TRUE) %>%
    apply(1, function(x) if (x[1] != x[2]) {sfnetworks:::points_to_line(nodes[x[1],], nodes[x[2],])}) %>%
    compact() %>%
    reduce(c)
  # combine the original edges with the newly created connections
  c(st_geometry(x), connections)
}

obj = st_as_sf(connect_close_nodes(obj, threshold = 0.0001))
obj
#> Simple feature collection with 4 features and 0 fields
#> geometry type:  LINESTRING
#> dimension:      XY
#> bbox:           xmin: 1 ymin: 3 xmax: 32 ymax: 34
#> CRS:            NA
#>                                x
#> 1          LINESTRING (1 3, 2 4)
#> 2 LINESTRING (1.00005 3.00005...
#> 3 LINESTRING (1.00005 3.00005...
#> 4 LINESTRING (1 3, 1.00005 3....

as_sfnetwork(obj)
#> # An sfnetwork with 4 nodes and 4 edges
#> #
#> # CRS:  NA 
#> #
#> # A directed simple graph with 1 component with spatially explicit edges
#> #
#> # Node Data:     4 x 1 (active)
#> # Geometry type: POINT
#> # Dimension:     XY
#> # Bounding box:  xmin: 1 ymin: 3 xmax: 32 ymax: 34
#>                   x
#>             <POINT>
#> 1             (1 3)
#> 2             (2 4)
#> 3 (1.00005 3.00005)
#> 4           (32 34)
#> #
#> # Edge Data:     4 x 3
#> # Geometry type: LINESTRING
#> # Dimension:     XY
#> # Bounding box:  xmin: 1 ymin: 3 xmax: 32 ymax: 34
#>    from    to                        x
#>   <int> <int>             <LINESTRING>
#> 1     1     2               (1 3, 2 4)
#> 2     3     4 (1.00005 3.00005, 32 34)
#> 3     3     1   (1.00005 3.00005, 1 3)
#> # … with 1 more row

^{Created on 2020-08-30 by the reprex package (v0.3.0)}

agila5 commented 4 years ago

Hi!

So point (1,3) and point (1.0005, 3.0005) should be treated as being the same node in the network right? So these two points are in the network represented by a single node. To me that raises the question: what coordinates should this combined node have? (1,3), (1.0005, 3.0005), or something in between?

That's a good question, thanks. I haven't thought about that problem since I erroneously assumed that the obvious solution would be to round both points at (1, 3).

I checked your examples, and, IMO, the best approach is the first one and it completely fixes this issue.

Would you consider adding an extra threshold or tolerance argument to as_sfnetworks.sf? Something like if (threshold > 0) { your code to round goes here}. The only downside that I can think of is that if we modify the input sf object in as_sfnetwork.sf, then we cannot recover the original sf object from the sfnetwork object. If you don't want to add the extra argument, then I would simply add this example to the vignette to document this behaviour.

Robinlovelace commented 4 years ago

Just looked at this discussion and I think an argument in as_sfnetworks.sf() called tolerance threshold or even snap is a a great idea.

loreabad6 commented 4 years ago

Hi! If I may pitch in this conversation, I think adding extra parameters makes our internal functions too complex. For example, if we add such a parameter, maybe also one for cleaning the network is also worth adding, and so on and so on (which then will also probably increase the number of dependencies). I am rather of the idea of pre-processing, and in that line what we could do is add a new vignette with common pre-processing steps before converting into an sfnetwork, overall how to prepare your data. We could then include this tolerance, maybe grass v.clean, and any other issue that might be worth. What do you think?

agila5 commented 4 years ago

Good morning! You and @luukvdmeer worked on the internals of sfnetworks and you know the details much better than me, so if you prefer creating a new vignette explaining the pre-processing steps instead of adding new parameters I'm 100% fine with that.

Robinlovelace commented 4 years ago

Good point about making functions too complex. Also happy if this functionality goes into another function for network preprocessing :+1:

luukvdmeer commented 3 years ago

This is not implemented inside a function, but now clearly explained in a separate section of the network pre-processing and cleaning vignette. Think that is enough for now to close the issue.

tomraster commented 1 year ago

Hello,

could you please share the internal functions:

sfnetworks:::points_to_line sfnetworks:::get_boundary_points

I cannot find them in current or past versions of the sfnetworks package.

agila5 commented 1 year ago

Hi @tomraster! The function get_boundary_points was originally coded in https://github.com/luukvdmeer/sfnetworks/commit/5f20ea3c6f95d89f0700355e896054659f505ef3 and it was originally defined as

get_boundary_points = function(x) {
  sf::st_cast(sf::st_boundary(sf::st_geometry(x)), "POINT")
}

The definition was updated in https://github.com/luukvdmeer/sfnetworks/commit/7dd71d024c437d86b4e1dd174e5553df9ebf2d29 as follows

get_boundary_points = function(x) {
  # 1a. extract coordinates
  x_coordinates <- sf::st_coordinates(x)

  # 1b. Find index of L1 column
  L1_index <- ncol(x_coordinates)

  # 1c. Remove colnames
  x_coordinates <- unname(x_coordinates)

  # 2. Find idxs of first and last coordinate (i.e. the boundary points)
  first_pair <- !duplicated(x_coordinates[, L1_index])
  last_pair <- !duplicated(x_coordinates[, L1_index], fromLast = TRUE)
  idxs <- first_pair | last_pair

  # 3. Extract idxs and rebuild sfc
  x_pairs <- x_coordinates[idxs, ]
  x_nodes <- sf::st_cast(
    sf::st_sfc(
      sf::st_multipoint(x_pairs[, -L1_index]),
      crs = sf::st_crs(x)
    ),
   "POINT"
  )
  x_nodes
}

In https://github.com/luukvdmeer/sfnetworks/commit/281427945914a0bd032069247fe1e0961c2da2dd, it was renamed as linestring_boundary_points and its definition was improved in https://github.com/luukvdmeer/sfnetworks/commit/18acb5621b0ca7c1bd78ef78a0b25515147dceb8 to the current definition you can find on CRAN:

sfnetworks:::linestring_boundary_points
#> function (x) 
#> {
#>     coords = sfc_to_df(st_geometry(x))
#>     first_pair = !duplicated(coords[["sfg_id"]])
#>     last_pair = !duplicated(coords[["sfg_id"]], fromLast = TRUE)
#>     idxs = first_pair | last_pair
#>     pairs = coords[idxs, names(coords) %in% c("x", "y", "z", 
#>         "m")]
#>     points = sfc_point(pairs)
#>     st_crs(points) = st_crs(x)
#>     st_precision(points) = st_precision(x)
#>     points
#> }
#> <bytecode: 0x000000002b8adb78>
#> <environment: namespace:sfnetworks>

^{Created on 2023-06-10 with reprex v2.0.2}

The function points_to_line was defined in https://github.com/luukvdmeer/sfnetworks/commit/603b2a9a16f56689e3fb4de5465cc29ea5d98bcd as follows:

points_to_line = function(x, y) {
  sf::st_cast(sf::st_union(x, y), "LINESTRING")
}

Then, it was modified in https://github.com/luukvdmeer/sfnetworks/commit/73b3f656537342068f70ec3f69aedfc8ddb41590 as

points_to_line = function(x, y) {
  sf::st_linestring(c(x, y))
}

and finally it was removed in https://github.com/luukvdmeer/sfnetworks/commit/3b11df0e62c1fc2b49347ac64433b120c7f132c6 since, I think, you can just use draw_lines() to accomplish the same task.

tomraster commented 1 year ago

Amazing, thank you very very much!