Closed agila5 closed 3 years ago
So point (1,3) and point (1.0005, 3.0005) should be treated as being the same node in the network right? So these two points are in the network represented by a single node. To me that raises the question: what coordinates should this combined node have? (1,3), (1.0005, 3.0005), or something in between? I would suggest to pre-process the data accordingly, before creating a network. For example:
# packages
library(sf)
#> Linking to GEOS 3.7.1, GDAL 2.2.2, PROJ 4.9.2
library(sfnetworks)
library(magrittr)
# data
pts1 <- matrix(1:4, 2)
ls1 <- st_linestring(pts1)
pts2 <- matrix(31:34, ,2)
pts2[1,1] <- 1.00005
pts2[1,2] <- 3.00005
ls2 <- st_linestring(pts2)
ls1; ls2
#> LINESTRING (1 3, 2 4)
#> LINESTRING (1.00005 3.00005, 32 34)
obj <- st_sf(geometry = c(st_geometry(ls1), st_geometry(ls2)))
# round coordinates
st_geometry(obj) = st_geometry(obj) %>%
lapply(function(x) round(x, 0)) %>%
st_sfc(crs = st_crs(obj))
obj
#> Simple feature collection with 2 features and 0 fields
#> geometry type: LINESTRING
#> dimension: XY
#> bbox: xmin: 1 ymin: 3 xmax: 32 ymax: 34
#> CRS: NA
#> geometry
#> 1 LINESTRING (1 3, 2 4)
#> 2 LINESTRING (1 3, 32 34)
# create network
as_sfnetwork(obj)
#> # An sfnetwork with 3 nodes and 2 edges
#> #
#> # CRS: NA
#> #
#> # A rooted tree with spatially explicit edges
#> #
#> # Node Data: 3 x 1 (active)
#> # Geometry type: POINT
#> # Dimension: XY
#> # Bounding box: xmin: 1 ymin: 3 xmax: 32 ymax: 34
#> geometry
#> <POINT>
#> 1 (1 3)
#> 2 (2 4)
#> 3 (32 34)
#> #
#> # Edge Data: 2 x 3
#> # Geometry type: LINESTRING
#> # Dimension: XY
#> # Bounding box: xmin: 1 ymin: 3 xmax: 32 ymax: 34
#> from to geometry
#> <int> <int> <LINESTRING>
#> 1 1 2 (1 3, 2 4)
#> 2 1 3 (1 3, 32 34)
Created on 2020-08-30 by the reprex package (v0.3.0)
Another option would be to keep the original nodes, but draw extra edges between those nodes that are within distance x from each other. A fast and dirty implementation, see below.
Note I use some internal sfnetworks
functions, maybe we can export some of those. Note also that this fast implementation does not consider attributes, but only the geometry column, and will duplicate original edges when they are themselves of a distance < x. However, this is just to showcase ;) The idea is the same: first pre-process the data into the desired format, and only then create the network.
# packages
library(sf)
#> Linking to GEOS 3.7.1, GDAL 2.2.2, PROJ 4.9.2
library(sfnetworks)
library(magrittr)
library(purrr)
#>
#> Attaching package: 'purrr'
#> The following object is masked from 'package:magrittr':
#>
#> set_names
# data
pts1 <- matrix(1:4, 2)
ls1 <- st_linestring(pts1)
pts2 <- matrix(31:34, ,2)
pts2[1,1] <- 1.00005
pts2[1,2] <- 3.00005
ls2 <- st_linestring(pts2)
obj <- st_sf(geometry = c(st_geometry(ls1), st_geometry(ls2)))
connect_close_nodes = function(x, threshold) {
# get boundary points of the edges
nodes = sfnetworks:::get_boundary_points(obj)
# compute distance matrix with these nodes
dist_mat = st_distance(nodes)
# connect those boundary points that are closer than the threshold distance
connections = which(dist_mat < threshold, arr.ind = TRUE) %>%
apply(1, function(x) if (x[1] != x[2]) {sfnetworks:::points_to_line(nodes[x[1],], nodes[x[2],])}) %>%
compact() %>%
reduce(c)
# combine the original edges with the newly created connections
c(st_geometry(x), connections)
}
obj = st_as_sf(connect_close_nodes(obj, threshold = 0.0001))
obj
#> Simple feature collection with 4 features and 0 fields
#> geometry type: LINESTRING
#> dimension: XY
#> bbox: xmin: 1 ymin: 3 xmax: 32 ymax: 34
#> CRS: NA
#> x
#> 1 LINESTRING (1 3, 2 4)
#> 2 LINESTRING (1.00005 3.00005...
#> 3 LINESTRING (1.00005 3.00005...
#> 4 LINESTRING (1 3, 1.00005 3....
as_sfnetwork(obj)
#> # An sfnetwork with 4 nodes and 4 edges
#> #
#> # CRS: NA
#> #
#> # A directed simple graph with 1 component with spatially explicit edges
#> #
#> # Node Data: 4 x 1 (active)
#> # Geometry type: POINT
#> # Dimension: XY
#> # Bounding box: xmin: 1 ymin: 3 xmax: 32 ymax: 34
#> x
#> <POINT>
#> 1 (1 3)
#> 2 (2 4)
#> 3 (1.00005 3.00005)
#> 4 (32 34)
#> #
#> # Edge Data: 4 x 3
#> # Geometry type: LINESTRING
#> # Dimension: XY
#> # Bounding box: xmin: 1 ymin: 3 xmax: 32 ymax: 34
#> from to x
#> <int> <int> <LINESTRING>
#> 1 1 2 (1 3, 2 4)
#> 2 3 4 (1.00005 3.00005, 32 34)
#> 3 3 1 (1.00005 3.00005, 1 3)
#> # … with 1 more row
Created on 2020-08-30 by the reprex package (v0.3.0)
Hi!
So point (1,3) and point (1.0005, 3.0005) should be treated as being the same node in the network right? So these two points are in the network represented by a single node. To me that raises the question: what coordinates should this combined node have? (1,3), (1.0005, 3.0005), or something in between?
That's a good question, thanks. I haven't thought about that problem since I erroneously assumed that the obvious solution would be to round both points at (1, 3).
I checked your examples, and, IMO, the best approach is the first one and it completely fixes this issue.
Would you consider adding an extra threshold
or tolerance
argument to as_sfnetworks.sf
? Something like if (threshold > 0) { your code to round goes here}
. The only downside that I can think of is that if we modify the input sf
object in as_sfnetwork.sf
, then we cannot recover the original sf
object from the sfnetwork
object. If you don't want to add the extra argument, then I would simply add this example to the vignette to document this behaviour.
Just looked at this discussion and I think an argument in as_sfnetworks.sf()
called tolerance
threshold
or even snap
is a a great idea.
Hi! If I may pitch in this conversation, I think adding extra parameters makes our internal functions too complex. For example, if we add such a parameter, maybe also one for cleaning the network is also worth adding, and so on and so on (which then will also probably increase the number of dependencies). I am rather of the idea of pre-processing, and in that line what we could do is add a new vignette with common pre-processing steps before converting into an sfnetwork, overall how to prepare your data. We could then include this tolerance, maybe grass v.clean
, and any other issue that might be worth. What do you think?
Good morning! You and @luukvdmeer worked on the internals of sfnetworks
and you know the details much better than me, so if you prefer creating a new vignette explaining the pre-processing steps instead of adding new parameters I'm 100% fine with that.
Good point about making functions too complex. Also happy if this functionality goes into another function for network preprocessing :+1:
This is not implemented inside a function, but now clearly explained in a separate section of the network pre-processing and cleaning vignette. Think that is enough for now to close the issue.
Hello,
could you please share the internal functions:
sfnetworks:::points_to_line sfnetworks:::get_boundary_points
I cannot find them in current or past versions of the sfnetworks package.
Hi @tomraster! The function get_boundary_points
was originally coded in https://github.com/luukvdmeer/sfnetworks/commit/5f20ea3c6f95d89f0700355e896054659f505ef3 and it was originally defined as
get_boundary_points = function(x) {
sf::st_cast(sf::st_boundary(sf::st_geometry(x)), "POINT")
}
The definition was updated in https://github.com/luukvdmeer/sfnetworks/commit/7dd71d024c437d86b4e1dd174e5553df9ebf2d29 as follows
get_boundary_points = function(x) {
# 1a. extract coordinates
x_coordinates <- sf::st_coordinates(x)
# 1b. Find index of L1 column
L1_index <- ncol(x_coordinates)
# 1c. Remove colnames
x_coordinates <- unname(x_coordinates)
# 2. Find idxs of first and last coordinate (i.e. the boundary points)
first_pair <- !duplicated(x_coordinates[, L1_index])
last_pair <- !duplicated(x_coordinates[, L1_index], fromLast = TRUE)
idxs <- first_pair | last_pair
# 3. Extract idxs and rebuild sfc
x_pairs <- x_coordinates[idxs, ]
x_nodes <- sf::st_cast(
sf::st_sfc(
sf::st_multipoint(x_pairs[, -L1_index]),
crs = sf::st_crs(x)
),
"POINT"
)
x_nodes
}
In https://github.com/luukvdmeer/sfnetworks/commit/281427945914a0bd032069247fe1e0961c2da2dd, it was renamed as linestring_boundary_points
and its definition was improved in https://github.com/luukvdmeer/sfnetworks/commit/18acb5621b0ca7c1bd78ef78a0b25515147dceb8 to the current definition you can find on CRAN:
sfnetworks:::linestring_boundary_points
#> function (x)
#> {
#> coords = sfc_to_df(st_geometry(x))
#> first_pair = !duplicated(coords[["sfg_id"]])
#> last_pair = !duplicated(coords[["sfg_id"]], fromLast = TRUE)
#> idxs = first_pair | last_pair
#> pairs = coords[idxs, names(coords) %in% c("x", "y", "z",
#> "m")]
#> points = sfc_point(pairs)
#> st_crs(points) = st_crs(x)
#> st_precision(points) = st_precision(x)
#> points
#> }
#> <bytecode: 0x000000002b8adb78>
#> <environment: namespace:sfnetworks>
Created on 2023-06-10 with reprex v2.0.2
The function points_to_line
was defined in https://github.com/luukvdmeer/sfnetworks/commit/603b2a9a16f56689e3fb4de5465cc29ea5d98bcd as follows:
points_to_line = function(x, y) {
sf::st_cast(sf::st_union(x, y), "LINESTRING")
}
Then, it was modified in https://github.com/luukvdmeer/sfnetworks/commit/73b3f656537342068f70ec3f69aedfc8ddb41590 as
points_to_line = function(x, y) {
sf::st_linestring(c(x, y))
}
and finally it was removed in https://github.com/luukvdmeer/sfnetworks/commit/3b11df0e62c1fc2b49347ac64433b120c7f132c6 since, I think, you can just use draw_lines()
to accomplish the same task.
Amazing, thank you very very much!
Is your feature request related to a problem? Please describe. I would like creating a new parameter in
as_sfnetwork()
(and similar functions) that is used to specify a maximum threshold indicating the tolerance to use when checking for the nodes in a network. A similar parameter exists instplanr::SpatialLinesNetwork
but I'm not sure about the implementation.The problem was introduced here: https://gis.stackexchange.com/questions/370640/how-to-connect-edges-in-a-network-even-if-they-dont-exactly-match-spatially
Reprex of the problem:
Created on 2020-08-28 by the reprex package (v0.3.0)
Describe the solution you'd like The nodes that are closer than a certain threshold (which defaults to 0) should be merged into a unique node.