Snap spatial points to the nearest point on the nearest edge

Please describe the problem. It would be nice to snap any given point to its nearest point on the network, no matter if that is an existing node or not. Then, that nearest network point can be added as a new node to the network, by splitting the edge it is located on (using for example lwgeom::st_split).

In sf, it is already possible to find the nearest point on a feature to another given feature. A quick function I wrote to do this:

# x is a set of input points.
# y is the edges element of the sfnetwork, containing lines.
st_snap_to_network = function(x, y) {
  # Function to find the nearest edge point to a single input point.
  f = function(z) {
    # Convert sfg to sfc.
    z = sf::st_sfc(z, crs = sf::st_crs(x))
    # Run st_nearest_points.
    # This returns for each combination (z, y) a linestring from z to the
    # .. nearest point on y.
    np = sf::st_nearest_points(z, y)
    # Then, the endpoint of the np-line to the nearest feature y to z
    # .. is the nearest edge point to z.
    lwgeom::st_endpoint(np[sf::st_nearest_feature(z, y),])
  }
  # Run f for all input points.
  geoms = do.call("c", lapply(sf::st_geometry(x), f))
  # Replace the geometries of x with the corresponding nearest edge points.
  if (inherits(x, "sf")) st_geometry(x) = geoms
  if (inherits(x, "sfc")) x = geoms
  x
}

However, there is a problem with this approach. The returned point does not always intersect with the network.. Sometimes it is located a few millimetres away from the edge. This has to do with floating point precision.

See: https://gis.stackexchange.com/questions/288988/nearest-point-does-not-intersect-line-in-sf-r-package And: https://github.com/r-spatial/sf/issues/790

This reply does not sound hopeful:

I think this problem cannot be solved. Setting precision essentially rounds coordinates to a (scaled) integer grid. If you want to represent points on lines connecting points on an integer grid exactly, you need to store them as rational numbers, i.e. as a ratio of two (possibly very large) integers. What R (or GEOS) does is approximating this ratio when storing it as an eight byte double. By that, we're back at R FAQ 7.31.

Describe how you currently solve the problem, or how attempted to solve it There is a recent blogpost addressing this problem, which uses a solution of drawing a small buffer around the returned nearest network point: https://ryanpeek.org/mapping-in-R-workshop/03_vig_snapping_points_to_line.html

Another option would be to include the returned nearest network point in the edge, by moving the edge slightly.

Describe the desired functionality of sfnetworks A function snap_to_network that replaces the geometry of the input point by the geometry of its nearest network point, that is guaranteed to intersect with the network. This can then also be used in shortest paths calculations between any set of given points.

A reprex:

library(sfnetworks)

# x is a set of points.
# y is the edges element of the sfnetwork, containing lines.
st_snap_to_network = function(x, y) {
  # Function to find the nearest network point to a single input point.
  f = function(z) {
    # Convert sfg to sfc.
    z = sf::st_sfc(z, crs = sf::st_crs(x))
    # Run st_nearest_points.
    # This returns for each combination (z, y) a linestring from z to the
    # .. nearest point on y.
    np = sf::st_nearest_points(z, y)
    # Then, the endpoint of the np-line to the nearest feature y from z
    # .. is the nearest network point to z.
    lwgeom::st_endpoint(np[sf::st_nearest_feature(z, y),])
  }
  # Run f for all input points.
  geoms = do.call("c", lapply(sf::st_geometry(x), f))
  # Replace the geometries of x with their nearest network points.
  if (inherits(x, "sf")) sf::st_geometry(x) = geoms
  if (inherits(x, "sfc")) x = geoms
  x
}

# Some edges.
l = dplyr::slice(roxel, 1:3)

# A point not on the edges.
p = sf::st_centroid(sf::st_union(l))
#> Warning in st_centroid.sfc(sf::st_union(l)): st_centroid does not give correct
#> centroids for longitude/latitude data

# The nearest point to p located on the edges.
np = st_snap_to_network(p, l)
#> although coordinates are longitude/latitude, st_nearest_points assumes that they are planar
#> although coordinates are longitude/latitude, st_nearest_feature assumes that they are planar
#> Linking to GEOS 3.7.1, GDAL 2.2.2, PROJ 4.9.2

# Plot.
plot(sf::st_geometry(l))
plot(sf::st_nearest_points(l, p), col = "red", add = T)
#> although coordinates are longitude/latitude, st_nearest_points assumes that they are planar
plot(p, pch = 20, add = T)
plot(np, col = "red", pch = 20, add = T)


# Does np intersect l? Yes! That's good.
sf::st_intersects(np, l)
#> although coordinates are longitude/latitude, st_intersects assumes that they are planar
#> Sparse geometry binary predicate list of length 1, where the predicate was `intersects'
#>  1: 2

# Now lets use a slightly moved p.
p2 = sf::st_sfc(
  p[[1]] + sf::st_point(c(0, 0.0004)),
  crs = sf::st_crs(p)
)

# The nearest point to p2 located on the edges.
np2 = st_snap_to_network(p2, l)
#> although coordinates are longitude/latitude, st_nearest_points assumes that they are planar
#> although coordinates are longitude/latitude, st_nearest_feature assumes that they are planar

# Plot. Looks good again.
plot(sf::st_geometry(l))
plot(sf::st_nearest_points(l, p2), col = "red", add = T)
#> although coordinates are longitude/latitude, st_nearest_points assumes that they are planar
plot(p2, pch = 20, add = T)
plot(np2, col = "red", pch = 20, add = T)


# However.. Does np2 intersects l? No.
sf::st_intersects(np2, l)
#> although coordinates are longitude/latitude, st_intersects assumes that they are planar
#> Sparse geometry binary predicate list of length 1, where the predicate was `intersects'
#>  1: (empty)

# It is located 0.000008383502 meters from the linestring.
sf::st_distance(np2, l[2,])
#> Units: [m]
#>              [,1]
#> [1,] 8.383502e-06

^{Created on 2020-06-08 by the reprex package (v0.3.0)}

(just as a remark regarding the long/lat warning: projecting the data - I tested EPSG 31466 does not solve it either)

Regarding the nearest point on the nearest edge. How should we name such a point when creating a function that finds it?

nearest_edge_vertex (might be confusing because igraph uses the vertex to refer to what sfnetworks and tidygraph call nodes).
nearest_geometric_point (this is what `shp2graph calls it, but it is quite long and to me not so clear)
nearest_edge_point (less confusing but does not sound so beautiful either)
...

Any ideas @Robinlovelace @agila5 @loreabad6 (and others)?

Any ideas @Robinlovelace @agila5 @loreabad6 (and others)?

Maybe nearest_point_on_edge ?

Outcomes from a small hack with @luukvdmeer today:

When we slightly change the proposed function to:

st_snap_to_network = function(x, y) {
  # Function to find the nearest network point to a single input point.
  f = function(z) {
    # Convert sfg to sfc.
    z = sf::st_sfc(z, crs = sf::st_crs(x))
    # Run st_nearest_points.
    # This returns for each combination (z, y) a linestring from z to the
    # .. nearest point on y.
    np = sf::st_nearest_points(z, y)
    # Then, we get the np-line to the nearest feature y from z
    np[sf::st_nearest_feature(z, y),] ### LINE CHANGED HERE!
  }
  # Run f for all input points.
  geoms = do.call("c", lapply(sf::st_geometry(x), f))
  # Replace the geometries of x with their nearest network points.
  if (inherits(x, "sf")) sf::st_geometry(x) = geoms
  if (inherits(x, "sfc")) x = geoms
  x
}

We noticed:

When the resulting LINESTRING from sf::st_nearest_points(), is passed onto sf::st_intersects(), it seems to always intersect with the nearest edge.
Hence, the LINESTRING itself can be passed onto lwgeom::st_split() and so the edge will be split by the end-point of this line.
BUT, this end-point, as showed before, does not intersect with the edge.
So, what lwgeom::st_split() does is move slightly the newly created LINESTRINGs to include this POINT.

Since lwgeom and sf seem to handle the precision problem in this way, we think we can then work with these functions to address this issue. In that way, we avoid adding extra precision bias by doing things like calculating an st_buffer around the snapped point to then split the line.

Won't fix, because of the precision error problem. For any point p and network G, it is practically impossible to return the projected (i.e. snapped) point of p on G with the guarantee that it intersects with G. Even an intersection of a point with a line does not always intersect with the line itself. Against all geometric rules, I know, but see below:

library(sf)
#> Linking to GEOS 3.7.1, GDAL 2.2.2, PROJ 4.9.2

v1 = st_point(c(7.532442, 51.95422))
v2 = st_point(c(7.53236, 51.95377))
v3 = st_point(c(7.53209, 51.95328))
l1 = st_sfc(st_linestring(c(v1, v2, v3)), crs = 4326)

v4 = st_point(c(7.53209, 51.95345))
v5 = st_point(c(7.53245, 51.95345))
l2 = st_sfc(st_linestring(c(v4, v5)), crs = 4326)

st_intersects(st_intersection(l2, l1), l1, sparse = F)
#> although coordinates are longitude/latitude, st_intersection assumes that they are planar
#> although coordinates are longitude/latitude, st_intersects assumes that they are planar
#>       [,1]
#> [1,] FALSE

# Just to check:
# It is not the planar assumption that causes this.
# Transformed to a projected CRS same issue exists.
l1 = st_transform(l1, 3035)
l2 = st_transform(l2, 3035)
st_intersects(st_intersection(l2, l1), l1, sparse = F)
#>       [,1]
#> [1,] FALSE

# A plot of the situation.
plot(l1)
plot(l2, add = T)
plot(st_intersection(l2, l1), pch = 20, col = "red", add = T)

^{Created on 2020-11-05 by the reprex package (v0.3.0)}

That makes the snapping function pretty useless I think. The way to go now would be to first blend the points as nodes into the network with st_blend (see #27) and then find the nearest node. That is a clean workflow that makes sense conceptually also I think.

The st_blend function handles the precision problem internally, as follows:

Situation: A point p and a network G. Goal: Find the projection q of p on G, such that we can subdivide G at q (i.e. use q to split the edge in G on which it is located).

Step 1

Find the edge x in G that is closest to p, using st_nearest_feature

Step 2

Draw the shortest possible connection c between p and x, using st_nearest_points

Step 3

Now, there are three options:

Option 1

The endpoint b of c intersects with x. Hence, q is equal to b.

Option 2

Because of precision errors, the endpoint b of c is located a few nano-meters away from x, but at the opposite side of x than p. That is, b itself does not intersect with x, but c does. Hence, conceptually q is equal to the intersection of c and x. Because of precision errors, however, this intersection might not intersect with x. That is, it is impossible for us to return the true q. However, since c does intersect with x, we can now use c to subdivide G.

Option 3

Because of precision errors, the endpoint b of c is located a few nano-meters away from x, but at the same side of x as p. That is, neither b nor c intersect with x. This is the problematic case. What st_blend does is:

Calculating the distance d between b and x.
Extending c in the same direction (hence, making a 90 degree angle with x) by 2*d.

This should guarantee intersection between the extended c and x. Hence, conceptually, q is equal to that intersection. Because of precision errors, however, this intersection might not intersect with x. That is, it is impossible for us to return the true q. However, since our extended c does intersect with x, we can use now c to subdivide G.

The three options are plotted (very exaggerated) below:

library(sf)
#> Linking to GEOS 3.7.1, GDAL 2.2.2, PROJ 4.9.2
par(mar = c(1,1,1,1))

x = st_linestring(c(st_point(c(0,-0.5)), st_point(c(0,2.5))))
p1 = st_point(c(-1,2))
p2 = st_point(c(-1,1))
p3 = st_point(c(-1,0))
b1 = st_point(c(0,2))
b2 = st_point(c(0.2,1))
b3 = st_point(c(-0.2,0))
c1 = st_linestring(c(p1, b1))
c2 = st_linestring(c(p2, b2))
c3 = st_linestring(c(p3, b3))
c3_ext = sfnetworks:::extend_line(c3, 0.4)

plot(x)
plot(c1, col = "red", add = T)
plot(c2, col = "red", add = T)
plot(c3, col = "red", add = T)
plot(c3_ext, col = "red", lty = 2, add = T)
plot(p1, pch = 20, add = T)
plot(p2, pch = 20, add = T)
plot(p3, pch = 20, add = T)
plot(b1, pch = 20, col = "red", add = T)
plot(b2, pch = 20, col = "red", add = T)
plot(b3, pch = 20, col = "red", add = T)
plot(st_intersection(c1, x), pch = 4, add = T)
plot(st_intersection(c2, x), pch = 4, add = T)
plot(st_intersection(c3_ext, x), pch = 4, add = T)

^{Created on 2020-11-05 by the reprex package (v0.3.0)}

Note that this does require to have the processes of projection (i.e. snapping of the point to the network) and subdivision (i.e. splitting the edge) inside the same function. That is what st_blend is. We can not split it into two separate functions for the individual processes, because as said, it is sometimes impossible to return the true location of the projection q.

Unfortunately this whole precision error does influence performance. Because even when you set your tolerance to 0, i.e. you only want to blend points that are already located on edges, we have to internally change this 0 to a very small number (currently 1e-5) because points that are on an edge may as well not be evaluated as intersecting. Hence, we have to search for all points located within the (small) tolerance before proceeding. This is time consuming.

Note also that this is not an R specific error. I read in a recent blogpost that in Pyhton it is the same. See https://towardsdatascience.com/connecting-pois-to-a-road-network-358a81447944

What do you think @Robinlovelace @agila5 @loreabad6 Sounds like a reasonable hack?

Won't fix as explained above.

luukvdmeer / sfnetworks