luukvdmeer / sfnetworks

Tidy Geospatial Networks in R
https://luukvdmeer.github.io/sfnetworks/
Other
347 stars 20 forks source link

Construction from sf object breaks when circular linestrings occur #59

Closed Robinlovelace closed 4 years ago

Robinlovelace commented 4 years ago

Describe the bug

The function as_sfnetworks() fails on a dataset from Sheffield and we're not sure why.

Reproducible example

# aim: get road network data (and slopes) for any city

remotes::install_github("itsleeds/geofabrik")
#> Using github PAT from envvar GITHUB_PAT
#> Skipping install of 'geofabrik' from a github remote, the SHA1 (73dd18af) has not changed since last install.
#>   Use `force = TRUE` to force installation
remotes::install_github("itsleeds/slopes")
#> Using github PAT from envvar GITHUB_PAT
#> Skipping install of 'slopes' from a github remote, the SHA1 (ecc77ae3) has not changed since last install.
#>   Use `force = TRUE` to force installation
library(geofabrik)
library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 7.0.0

city_name = "sheffield"
buffer_size = 2000

city_sf = tmaptools::geocode_OSM(city_name, as.sf = TRUE)
city_buffer = stplanr::geo_buffer(city_sf, dist = buffer_size)
mapview::mapview(city_buffer)


osm_data_city = get_geofabrik(name = city_sf)
#> although coordinates are longitude/latitude, st_contains assumes that they are planar
#> The place is within these geofabrik zones: Europe, Great Britain, Britain and Ireland, England, South Yorkshire
#> Selecting the smallest: South Yorkshire
#> Data already detected in ~/hd/data/osm/South Yorkshire.osm.pbf
#> Old attributes: attributes=name,highway,waterway,aerialway,barrier,man_made
#> New attributes: attributes=name,highway,waterway,aerialway,barrier,man_made,maxspeed,oneway,building,surface,landuse,natural,start_date,wall,service,lanes,layer,tracktype,bridge,foot,bicycle,lit,railway,footway
#> Using ini file that can can be edited with file.edit(/tmp/RtmpQB9Hws/ini_new.ini)
pryr::object_size(osm_data_city)
#> Registered S3 method overwritten by 'pryr':
#>   method      from
#>   print.bytes Rcpp
#> 72.3 MB

osm_data_in_buffer = st_intersection(osm_data_city, city_buffer)
#> although coordinates are longitude/latitude, st_intersection assumes that they are planar
#> Warning: attribute variables are assumed to be spatially constant throughout all
#> geometries
pryr::object_size(osm_data_in_buffer)
#> 5.31 MB

mapview::mapview(osm_data_in_buffer)

osm_data_no_na = osm_data_in_buffer[!is.na(osm_data_in_buffer$highway), ]
pryr::object_size(osm_data_no_na)
#> 3.98 MB

summary(geo_type <- sf::st_geometry_type(osm_data_no_na))
#>           GEOMETRY              POINT         LINESTRING            POLYGON 
#>                  0                  0               5052                  0 
#>         MULTIPOINT    MULTILINESTRING       MULTIPOLYGON GEOMETRYCOLLECTION 
#>                  0                 29                  0                  0 
#>     CIRCULARSTRING      COMPOUNDCURVE       CURVEPOLYGON         MULTICURVE 
#>                  0                  0                  0                  0 
#>       MULTISURFACE              CURVE            SURFACE  POLYHEDRALSURFACE 
#>                  0                  0                  0                  0 
#>                TIN           TRIANGLE 
#>                  0                  0
osm_data_linestring = osm_data_no_na[geo_type == "LINESTRING", ]

library(slopes)
# lisbon_route_3d_auto = slope_3d(r = lisbon_route)
st_geometry(osm_data_linestring)
#> Geometry set for 5052 features 
#> geometry type:  LINESTRING
#> dimension:      XY
#> bbox:           xmin: -1.500281 ymin: 53.3627 xmax: -1.440187 ymax: 53.39863
#> geographic CRS: WGS 84
#> First 5 geometries:
#> LINESTRING (-1.477983 53.38939, -1.477845 53.38...
#> LINESTRING (-1.484426 53.38227, -1.48437 53.382...
#> LINESTRING (-1.483918 53.3819, -1.483944 53.381...
#> LINESTRING (-1.478611 53.3741, -1.478629 53.374...
#> LINESTRING (-1.479656 53.37408, -1.479689 53.37...
osm_data_z = slope_3d(osm_data_linestring) # must be linestrings
#> Loading required namespace: ceramic
#> Preparing to download: 16 tiles at zoom = 14 from 
#> https://api.mapbox.com/v4/mapbox.terrain-rgb/
#> [1] TRUE
st_geometry(osm_data_z)
#> Geometry set for 5052 features 
#> geometry type:  LINESTRING
#> dimension:      XYZ
#> bbox:           xmin: -1.500281 ymin: 53.3627 xmax: -1.440187 ymax: 53.39863
#> z_range:        zmin: 42.05858 zmax: 180.2795
#> geographic CRS: WGS 84
#> First 5 geometries:
#> LINESTRING Z (-1.477983 53.38939 55.8, -1.47784...
#> LINESTRING Z (-1.484426 53.38227 121.745, -1.48...
#> LINESTRING Z (-1.483918 53.3819 108.5726, -1.48...
#> LINESTRING Z (-1.478611 53.3741 80.0275, -1.478...
#> LINESTRING Z (-1.479656 53.37408 79.60677, -1.4...
m = st_coordinates(osm_data_z)
head(m)
#>              X        Y        Z L1
#> [1,] -1.477983 53.38939 55.80000  1
#> [2,] -1.477845 53.38926 55.64014  1
#> [3,] -1.477702 53.38914 55.85570  1
#> [4,] -1.477549 53.38906 55.90000  1
#> [5,] -1.477468 53.38901 55.90000  1
#> [6,] -1.477391 53.38897 56.07922  1
osm_data_z$slope = slope_matrix_weighted(m)
# note to self: not intuitive at all!
plot(osm_data_z["slope"])

summary(osm_data_z$slope)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#> 0.02633 0.02633 0.02633 0.02633 0.02633 0.02633

net_3d = sfnetworks::as_sfnetwork(osm_data_z) # fails
#> Error in CPL_get_z_range(obj, 1): z error - expecting three columns;
net_3d = sfnetworks::as_sfnetwork(sf::st_zm(osm_data_z)) # fails
#> Error: Assigned data `value` must be compatible with existing data.
#> ✖ Existing data has 5052 rows.
#> ✖ Assigned data has 5022 rows.
#> ℹ Only vectors of size 1 are recycled.
osm_data_z2 = sf::st_as_sf(
  data.frame(stringsAsFactors = FALSE,
    osm_id = osm_data_z$osm_id,
    highway = osm_data_z$highway,
    slope = osm_data_z$slope
  ),
  geometry = sf::st_geometry(osm_data_z)
)
net_3d = sfnetworks::as_sfnetwork(sf::st_zm(osm_data_z2)) # fails
#> Error in `[[<-.data.frame`(`*tmp*`, i, value = c(1L, 3L, 5L, 7L, 9L, 5L, : replacement has 5022 rows, data has 5052
net_3d = sfnetworks::as_sfnetwork(sf::st_zm(osm_data_z2)[1:9, ]) # fails
#> Error in `[[<-.data.frame`(`*tmp*`, i, value = c(1L, 3L, 5L, 7L, 9L, 5L, : replacement has 8 rows, data has 9

Created on 2020-06-16 by the reprex package (v0.3.0)

Expected behavior A clear and concise description of what you expected to happen.

I expected this to work:

net_3d = sfnetworks::as_sfnetwork(sf::st_zm(osm_data_z2)) # fails

R Session Info Include session info of your session, with the function sessionInfo().

library(sfnetworks)

Created on 2020-06-16 by the reprex package (v0.3.0)

Session info ``` r devtools::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 3.6.3 (2020-02-29) #> os Ubuntu 18.04.4 LTS #> system x86_64, linux-gnu #> ui X11 #> language en_GB:en #> collate en_GB.UTF-8 #> ctype en_GB.UTF-8 #> tz Europe/London #> date 2020-06-16 #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date lib source #> assertthat 0.2.1 2019-03-21 [2] CRAN (R 3.6.0) #> backports 1.1.7 2020-05-13 [1] CRAN (R 3.6.3) #> callr 3.4.3 2020-03-28 [1] CRAN (R 3.6.3) #> class 7.3-17 2020-04-26 [2] CRAN (R 3.6.3) #> classInt 0.4-3 2020-04-06 [1] Github (r-spatial/classInt@d024051) #> cli 2.0.2 2020-02-28 [1] CRAN (R 3.6.2) #> crayon 1.3.4 2017-09-16 [2] standard (@1.3.4) #> DBI 1.1.0 2019-12-15 [2] CRAN (R 3.6.2) #> desc 1.2.0 2018-05-01 [2] standard (@1.2.0) #> devtools 2.3.0 2020-04-10 [1] CRAN (R 3.6.3) #> digest 0.6.25 2020-02-23 [1] CRAN (R 3.6.2) #> dplyr 1.0.0.9000 2020-06-16 [1] Github (tidyverse/dplyr@fd08fe9) #> e1071 1.7-3 2019-11-26 [2] CRAN (R 3.6.1) #> ellipsis 0.3.1 2020-05-15 [3] CRAN (R 3.6.3) #> evaluate 0.14 2019-05-28 [2] CRAN (R 3.6.0) #> fansi 0.4.1 2020-01-08 [1] CRAN (R 3.6.2) #> fs 1.4.1 2020-04-04 [2] CRAN (R 3.6.3) #> generics 0.0.2 2018-11-29 [3] CRAN (R 3.5.1) #> glue 1.4.1 2020-05-13 [2] CRAN (R 3.6.3) #> highr 0.8 2019-03-20 [3] CRAN (R 3.5.3) #> htmltools 0.4.0.9003 2020-04-09 [1] Github (rstudio/htmltools@1a7d0dc) #> igraph 1.2.5 2020-03-19 [1] CRAN (R 3.6.3) #> KernSmooth 2.23-17 2020-04-26 [4] CRAN (R 3.6.3) #> knitr 1.28 2020-02-06 [1] CRAN (R 3.6.2) #> lifecycle 0.2.0.9000 2020-03-16 [1] Github (r-lib/lifecycle@355dcba) #> lwgeom 0.2-5 2020-06-12 [1] CRAN (R 3.6.3) #> magrittr 1.5 2014-11-22 [2] CRAN (R 3.5.2) #> memoise 1.1.0 2017-04-21 [3] CRAN (R 3.5.0) #> pillar 1.4.4 2020-05-05 [1] CRAN (R 3.6.3) #> pkgbuild 1.0.8 2020-05-07 [1] CRAN (R 3.6.3) #> pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 3.6.1) #> pkgload 1.1.0 2020-05-29 [3] CRAN (R 3.6.3) #> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 3.6.2) #> processx 3.4.2 2020-02-09 [1] CRAN (R 3.6.3) #> ps 1.3.3 2020-05-08 [1] CRAN (R 3.6.3) #> purrr 0.3.4 2020-04-17 [1] CRAN (R 3.6.3) #> R6 2.4.1 2019-11-12 [2] CRAN (R 3.6.1) #> Rcpp 1.0.4.6 2020-04-09 [1] CRAN (R 3.6.3) #> remotes 2.1.1 2020-02-15 [1] CRAN (R 3.6.2) #> rlang 0.4.6.9000 2020-06-16 [1] Github (r-lib/rlang@eecbd90) #> rmarkdown 2.2 2020-05-31 [1] CRAN (R 3.6.3) #> rprojroot 1.3-2 2018-01-03 [2] CRAN (R 3.5.3) #> sessioninfo 1.1.1 2018-11-05 [3] CRAN (R 3.5.1) #> sf 0.9-4 2020-06-16 [1] Github (r-spatial/sf@aa6dba2) #> sfnetworks * 0.3.0 2020-06-16 [1] Github (luukvdmeer/sfnetworks@fad80c2) #> stringi 1.4.6 2020-02-17 [1] CRAN (R 3.6.2) #> stringr 1.4.0 2019-02-10 [2] standard (@1.4.0) #> testthat 2.3.2 2020-03-02 [1] CRAN (R 3.6.3) #> tibble 3.0.1 2020-04-20 [1] CRAN (R 3.6.3) #> tidygraph 1.2.0 2020-05-12 [2] CRAN (R 3.6.3) #> tidyr 1.1.0 2020-05-20 [3] CRAN (R 3.6.3) #> tidyselect 1.1.0 2020-05-11 [1] CRAN (R 3.6.3) #> units 0.6-7 2020-06-13 [1] CRAN (R 3.6.3) #> usethis 1.6.1 2020-04-29 [1] CRAN (R 3.6.3) #> vctrs 0.3.1 2020-06-05 [1] CRAN (R 3.6.3) #> withr 2.2.0 2020-04-20 [2] CRAN (R 3.6.3) #> xfun 0.14 2020-05-20 [1] CRAN (R 3.6.3) #> yaml 2.2.1 2020-02-01 [1] CRAN (R 3.6.2) #> #> [1] /home/robin/R/x86_64-pc-linux-gnu-library/3.6 #> [2] /usr/local/lib/R/site-library #> [3] /usr/lib/R/site-library #> [4] /usr/lib/R/library ```

From our work on #46

agila5 commented 4 years ago

Hi! I think that I understood what's the problem here. The solution is not super simple but I will try my best to summarize it. The first part of the following reprex is taken from Robin's example.

# download and install packages
remotes::install_github("itsleeds/geofabrik", quiet = TRUE)
remotes::install_github("itsleeds/slopes", quiet = TRUE)
remotes::install_github("luukvdmeer/sfnetworks", upgrade = FALSE, quiet = TRUE)

# load packages
library(geofabrik)
library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
library(slopes)

# download geofabrik data
city_name = "sheffield"
buffer_size = 2000
city_sf = tmaptools::geocode_OSM(city_name, as.sf = TRUE)
city_buffer = stplanr::geo_buffer(city_sf, dist = buffer_size)

osm_data_city = get_geofabrik(name = city_sf)
osm_data_no_na = osm_data_in_buffer[!is.na(osm_data_in_buffer$highway), ]
geo_type <- sf::st_geometry_type(osm_data_no_na)
osm_data_linestring = osm_data_no_na[geo_type == "LINESTRING", ]

# add slopes
osm_data_z = slope_3d(osm_data_linestring) # must be linestrings

# create sfnetwork object
osm_data_no_z <- st_zm(osm_data_z)
#error
sfnetworks::as_sfnetwork(osm_data_no_z) 
#> Error: Assigned data `value` must be compatible with existing data.
#> x Existing data has 5057 rows.
#> x Assigned data has 5027 rows.
#> i Only vectors of size 1 are recycled.

Using the debugging functions I can see that the problem is created by circular LINESTRINGS, such as

par(mar = rep(0, 4))
plot(st_geometry(osm_data_no_z[3, ]))

indeed

st_boundary(st_geometry(osm_data_no_z)[3])
#> Geometry set for 1 feature  (with 1 geometry empty)
#> geometry type:  MULTIPOINT
#> dimension:      XY
#> bbox:           xmin: NA ymin: NA xmax: NA ymax: NA
#> geographic CRS: WGS 84
#> MULTIPOINT EMPTY

and this is the reason why there are less rows than what it should be. The weird part is that there is no problem with stplanr:

osm_data_no_z_stplanr <- stplanr::SpatialLinesNetwork(osm_data_no_z)
#> Warning in SpatialLinesNetwork.sf(osm_data_no_z): Graph composed of multiple
#> subgraphs, consider cleaning it with sln_clean_graph().

Why? I think the reason is that stplanr builds the nodes starting exactly from the coordinate of each LINESTRING POINT: https://github.com/ropensci/stplanr/blob/b5addea23a1d952de2cc8aac123ce0a9f38e060c/R/SpatialLinesNetwork.R#L130-L135 I think that actually we could adopt the same approach here. Thoughts? I wouldn't use exactly the same ideas since the conversion to data.frame and dplyr operations are expensive for such a simple case, but I think we could apply something like this. First we need to extract the coordinates:

osm_data_no_z_coordinates <- st_coordinates(osm_data_no_z)
head(osm_data_no_z_coordinates, 10)
#>               X        Y L1
#>  [1,] -1.477983 53.38939  1
#>  [2,] -1.477845 53.38926  1
#>  [3,] -1.477702 53.38914  1
#>  [4,] -1.477549 53.38906  1
#>  [5,] -1.477468 53.38901  1
#>  [6,] -1.477391 53.38897  1
#>  [7,] -1.477221 53.38889  1
#>  [8,] -1.484426 53.38227  2
#>  [9,] -1.484370 53.38231  2
#> [10,] -1.484317 53.38238  2

then we should indentify the indexes of the first and last occurrences of the pairs of coordinates:

first_pair <- !duplicated(osm_data_no_z_coordinates[, "L1"])
last_pair <- !duplicated(osm_data_no_z_coordinates[, "L1"], fromLast = TRUE)
idxs <- first_pair | last_pair

I took the code from here: https://stackoverflow.com/questions/11546684/how-can-i-find-the-first-and-last-occurrences-of-an-element-in-a-data-frame Now we need to extract the pairs and rebuild the sfc structure:

osm_data_no_z_pairs <- osm_data_no_z_coordinates[idxs, ]
osm_data_no_z_nodes <- st_cast(st_sfc(st_multipoint(osm_data_no_z_pairs[, c("X", "Y")]), crs = st_crs(osm_data_no_z)), "POINT")
osm_data_no_z_nodes
#> Geometry set for 10114 features 
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: -1.500281 ymin: 53.3627 xmax: -1.440187 ymax: 53.39863
#> geographic CRS: WGS 84
#> First 5 geometries:
#> POINT (-1.477983 53.38939)
#> POINT (-1.477221 53.38889)
#> POINT (-1.484426 53.38227)
#> POINT (-1.479991 53.38661)
#> POINT (-1.483918 53.3819)

Now I try to wrap up everything in a function:

test_coordinates_pairs <- function(x) {
  # 1. extract coordinates
  x_coordinates <- unname(st_coordinates(x))

  # 2. Find idxs of first and last coordinate (i.e. the boundary points)
  first_pair <- !duplicated(x_coordinates[, 3])
  last_pair <- !duplicated(x_coordinates[, 3], fromLast = TRUE)
  idxs <- first_pair | last_pair

  # 3. Extract idxs and rebuild sfc
  x_pairs <- x_coordinates[idxs, ]
  x_nodes <- st_cast(st_sfc(st_multipoint(x_pairs[, c(1, 2)]), crs = st_crs(x)), "POINT")
  x_nodes
}

Compare with sfnetworks::get_boundary_points and they look the same I think

test_coordinates_pairs(sfnetworks::roxel)
#> Geometry set for 1702 features 
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 7.522622 ymin: 51.94151 xmax: 7.546705 ymax: 51.9612
#> geographic CRS: WGS 84
#> First 5 geometries:
#> POINT (7.533722 51.95556)
#> POINT (7.533461 51.95576)
#> POINT (7.532442 51.95422)
#> POINT (7.53209 51.95328)
#> POINT (7.532709 51.95209)
sfnetworks:::get_boundary_points(sfnetworks::roxel)
#> Geometry set for 1702 features 
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 7.522622 ymin: 51.94151 xmax: 7.546705 ymax: 51.9612
#> geographic CRS: WGS 84
#> First 5 geometries:
#> POINT (7.533722 51.95556)
#> POINT (7.533461 51.95576)
#> POINT (7.532442 51.95422)
#> POINT (7.53209 51.95328)
#> POINT (7.532709 51.95209)

Check performance

bench::mark(
  test_coordinates_pairs(sfnetworks::roxel),
  sfnetworks:::get_boundary_points(sfnetworks::roxel),
  iterations = 20,
  check = FALSE
)
#> # A tibble: 2 x 6
#>   expression                                             min median `itr/sec`
#>   <bch:expr>                                          <bch:> <bch:>     <dbl>
#> 1 test_coordinates_pairs(sfnetworks::roxel)           30.2ms 35.7ms      25.1
#> 2 sfnetworks:::get_boundary_points(sfnetworks::roxel) 44.6ms 47.8ms      20.6
#> # ... with 2 more variables: mem_alloc <bch:byt>, `gc/sec` <dbl>

and it is slightly faster than the current approach. Why did I set check = FALSE? Because

all.equal(test_coordinates_pairs(sfnetworks::roxel), sfnetworks:::get_boundary_points(sfnetworks::roxel))
#> [1] "Attributes: < Component \"crs\": Names: 2 string mismatches >"                          
#> [2] "Attributes: < Component \"crs\": Component 1: Modes: character, numeric >"              
#> [3] "Attributes: < Component \"crs\": Component 1: target is character, current is numeric >"
#> [4] "Attributes: < Component \"crs\": Component 2: 1 string mismatch >"                      
#> [5] "Attributes: < Component \"ids\": Numeric: lengths (1, 851) differ >"

Do you know if the differences in the crs and ids are relevant? Obviously it should be tested a bit more but I just wanted to check your ideas before keep working.

Created on 2020-06-17 by the reprex package (v0.3.0)

I tested these ideas in another branch of this repo and it seems to be working:

# download and install packages
remotes::install_github("luukvdmeer/sfnetworks", ref = "cb42231f1fc50677dc916c0ccaef9cadd323d50d", upgrade = FALSE, quiet = TRUE)

# load packages
library(geofabrik)
library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
library(slopes)

# download geofabrik data
city_name = "sheffield"
buffer_size = 2000
city_sf = tmaptools::geocode_OSM(city_name, as.sf = TRUE)
city_buffer = stplanr::geo_buffer(city_sf, dist = buffer_size)

osm_data_city = get_geofabrik(name = city_sf)
#> although coordinates are longitude/latitude, st_contains assumes that they are planar
#> The place is within these geofabrik zones: Europe, Great Britain, Britain and Ireland, England, South Yorkshire
#> Selecting the smallest: South Yorkshire
#> Data already detected in C:/Users/Utente/Documents/my_osm_data_from_geofabric/South Yorkshire.osm.pbf
#> Old attributes: attributes=name,highway,waterway,aerialway,barrier,man_made
#> New attributes: attributes=name,highway,waterway,aerialway,barrier,man_made,maxspeed,oneway,building,surface,landuse,natural,start_date,wall,service,lanes,layer,tracktype,bridge,foot,bicycle,lit,railway,footway
#> Using ini file that can can be edited with file.edit(C:\Users\Utente\AppData\Local\Temp\RtmpmuBoz5/ini_new.ini)
osm_data_in_buffer = st_intersection(osm_data_city, city_buffer)
#> although coordinates are longitude/latitude, st_intersection assumes that they are planar
#> Warning: attribute variables are assumed to be spatially constant throughout all
#> geometries
osm_data_no_na = osm_data_in_buffer[!is.na(osm_data_in_buffer$highway), ]
geo_type <- sf::st_geometry_type(osm_data_no_na)
osm_data_linestring = osm_data_no_na[geo_type == "LINESTRING", ]

# add slopes
osm_data_z = slope_3d(osm_data_linestring) # must be linestrings
#> Loading required namespace: ceramic
#> Preparing to download: 16 tiles at zoom = 14 from 
#> https://api.mapbox.com/v4/mapbox.terrain-rgb/

# create sfnetwork object
osm_data_no_z <- st_zm(osm_data_z)
osm_data_no_z_sfnetwork <- sfnetworks::as_sfnetwork(osm_data_no_z, directed = FALSE) 
plot(osm_data_no_z_sfnetwork)

Created on 2020-06-17 by the reprex package (v0.3.0)

If you think it's a good idea I will keep working in that separate branch and PR to develop.

luukvdmeer commented 4 years ago

Thanks for investigating @agila5 Unfortunately I cannot run the examples without a Mapbox API key.

The workflow you describe is exactly the same idea of how sfnetworks v0.2.0 used to work, see https://github.com/luukvdmeer/sfnetworks/blob/v0.2.0/R/utils.R. First extracting all coordinates, and then only taking the first and last one for each point. But your implementation is much neater and faster.

I changed to st_boundary to have clearer and easier understandable code. But it is interesting to see that the approach does not work for circular linestrings. A toy example to proof that once more:

library(sf)
#> Linking to GEOS 3.7.1, GDAL 2.2.2, PROJ 4.9.2
l = st_linestring(c(st_point(c(1,1)), st_point(c(1,1))))
st_boundary(l)
#> MULTIPOINT EMPTY

# Of course, l is not a valid linestring geometry, because in the simple
# feature standard, linestrings cannot have the same start and endpoint.
st_is_valid(l)
#> [1] FALSE

Created on 2020-06-17 by the reprex package (v0.3.0)

But in road networks, these circular linestrings will occur a lot right, e.g. for roundabouts and squares (that according the sf standard should be modeled as polygons, but that does not fit into the routable network structure).

I think we can hardly call the st_boundary behavior a bug, since we apply it to an invalid geometry, but lets say it is peculiar at least ;)

Having said that: I guess the conclusion is clear that st_boundary is not a good approach for spatial network creation from linestrings, and your code looks neat and fast. So lets do that!

I also do not see what exactly the difference in the crs is, but since st_crs(test_coordinates_pairs(roxel)) == st_crs(roxel)) returns TRUE I would not worry too much about that. Maybe @Robinlovelace has an idea?

Note: we should first merge master into develop so that they are equal again. Sorry I did not do that before

luukvdmeer commented 4 years ago

One thing I just noticed in the code @agila5 :

If Z and/or M coordinates are present, the L1 column will not be the third column in the coordinates matrix, but only the fourth or fifth. Also, x_pairs[, c(1, 2)] will only take the X and Y coordinate, but not the Z and/or M coordinate if present.

library(sfnetworks)
library(sf)
#> Linking to GEOS 3.7.1, GDAL 2.2.2, PROJ 4.9.2

test_coordinates_pairs = function(x) {
  # 1. extract coordinates
  x_coordinates <- unname(st_coordinates(x))

  # 2. Find idxs of first and last coordinate (i.e. the boundary points)
  first_pair <- !duplicated(x_coordinates[, 3])
  last_pair <- !duplicated(x_coordinates[, 3], fromLast = TRUE)
  idxs <- first_pair | last_pair

  # 3. Extract idxs and rebuild sfc
  x_pairs <- x_coordinates[idxs, ]
  x_nodes <- st_cast(st_sfc(st_multipoint(x_pairs[, c(1, 2)]), crs = st_crs(x)), "POINT")
  x_nodes
}

roxel_z = st_zm(roxel, drop = FALSE, what = "Z")

sfnetworks:::get_boundary_points(roxel_z)
#> Geometry set for 1702 features 
#> geometry type:  POINT
#> dimension:      XYZ
#> bbox:           xmin: 7.522622 ymin: 51.94151 xmax: 7.546705 ymax: 51.9612
#> CRS:            EPSG:4326
#> First 5 geometries:
#> POINT Z (7.533722 51.95556 0)
#> POINT Z (7.533461 51.95576 0)
#> POINT Z (7.532442 51.95422 0)
#> POINT Z (7.53209 51.95328 0)
#> POINT Z (7.532709 51.95209 0)

test_coordinates_pairs(roxel_z)
#> Geometry set for 2 features 
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 7.530218 ymin: 51.95556 xmax: 7.533722 ymax: 51.95924
#> CRS:            EPSG:4326
#> POINT (7.533722 51.95556)
#> POINT (7.530218 51.95924)

Created on 2020-06-17 by the reprex package (v0.3.0)

agila5 commented 4 years ago

Hi and thanks for the review!

The workflow you describe is exactly the same idea of how sfnetworks v0.2.0 used to work, see https://github.com/luukvdmeer/sfnetworks/blob/v0.2.0/R/utils.R.

I totally missed that 😅 Still, it's good that the underlying ideas are the same.

If Z and/or M coordinates are present, the L1 column will not be the third column in the coordinates matrix, but only the fourth or fifth. Also, x_pairs[, c(1, 2)] will only take the X and Y coordinate, but not the Z and/or M coordinate if present.

That's not super difficult to fix. For example, I was thinking of:

# packages
library(sfnetworks)
library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1

test_coordinates_pairs = function(x) {
  # 1a. extract coordinates
  x_coordinates <- st_coordinates(x)

  # 1b. Find index of L1 column
  L1_index <- which(colnames(x_coordinates) == "L1")

  # 1c. Remove colnames
  x_coordinates <- unname(x_coordinates)

  # 2. Find idxs of first and last coordinate (i.e. the boundary points)
  first_pair <- !duplicated(x_coordinates[, L1_index])
  last_pair <- !duplicated(x_coordinates[, L1_index], fromLast = TRUE)
  idxs <- first_pair | last_pair

  # 3. Extract idxs and rebuild sfc
  x_pairs <- x_coordinates[idxs, ]
  x_nodes <- st_cast(st_sfc(st_multipoint(x_pairs[, -L1_index]), crs = st_crs(x)), "POINT")
  x_nodes
}

roxel_z = st_zm(roxel, drop = FALSE, what = "Z")

test_coordinates_pairs(roxel_z)
#> Geometry set for 1702 features 
#> geometry type:  POINT
#> dimension:      XYZ
#> bbox:           xmin: 7.522622 ymin: 51.94151 xmax: 7.546705 ymax: 51.9612
#> geographic CRS: WGS 84
#> First 5 geometries:
#> POINT Z (7.533722 51.95556 0)
#> POINT Z (7.533461 51.95576 0)
#> POINT Z (7.532442 51.95422 0)
#> POINT Z (7.53209 51.95328 0)
#> POINT Z (7.532709 51.95209 0)
sfnetworks:::get_boundary_points(roxel_z)
#> Geometry set for 1702 features 
#> geometry type:  POINT
#> dimension:      XYZ
#> bbox:           xmin: 7.522622 ymin: 51.94151 xmax: 7.546705 ymax: 51.9612
#> geographic CRS: WGS 84
#> First 5 geometries:
#> POINT Z (7.533722 51.95556 0)
#> POINT Z (7.533461 51.95576 0)
#> POINT Z (7.532442 51.95422 0)
#> POINT Z (7.53209 51.95328 0)
#> POINT Z (7.532709 51.95209 0)

Created on 2020-06-17 by the reprex package (v0.3.0)

Other suggestions are welcome. I'm not sure if the unname step is strictly necessary but I read once that there may be some problems with named sfc objects (https://github.com/ropensci/osmdata/issues/188). Anyway, I will test this a little bit more in the next days using OSM data of other cities and, if it still looks good, create a PR!

luukvdmeer commented 4 years ago

I'm not sure if the unname step is strictly necessary

I don't think it is necessary. The result of st_coordinates is a matrix with column names, but those names are not preserved later in in the sfc either way (or am I wrong?)

Other suggestions are welcome.

The L1 column will always be the last one. Knowing that, getting L1_index by ncol(x_coordinates) is slightly faster than which(colnames(x_coordinates) == "L1"). But hey, what are we talking about here.. Nanoseconds :laughing:

I changed the name of the issue, now we know where the problem occurs. Also, I updated develop to be equal with master again. Can I assign you for this @agila5 ? And would it be possible to implement before our next milestone on June 30th? - if not, I can implement your function as well -

agila5 commented 4 years ago

I don't think it is necessary. The result of st_coordinates is a matrix with column names, but those names are not preserved later in in the sfc either way (or am I wrong?)

This is what I meant.

# packages
library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1

# data
my_linestring <- st_linestring(rbind(c(0, 0), c(1, 1), c(2, 2)))

# test with the absence of unname
test_coordinates_pairs = function(x) {
  # 1a. extract coordinates
  x_coordinates <- st_coordinates(x)

  # 1b. Find index of L1 column
  L1_index <- which(colnames(x_coordinates) == "L1")

  # 2. Find idxs of first and last coordinate (i.e. the boundary points)
  first_pair <- !duplicated(x_coordinates[, L1_index])
  last_pair <- !duplicated(x_coordinates[, L1_index], fromLast = TRUE)
  idxs <- first_pair | last_pair

  # 3. Extract idxs and rebuild sfc
  x_pairs <- x_coordinates[idxs, ]
  x_nodes <- st_cast(st_sfc(st_multipoint(x_pairs[, -L1_index]), crs = st_crs(x)), "POINT")
  x_nodes
}

test_sfc <- test_coordinates_pairs(my_linestring)
lapply(test_sfc, names)
#> [[1]]
#> [1] "X" "Y"
#> 
#> [[2]]
#> [1] "X" "Y"

Created on 2020-06-20 by the reprex package (v0.3.0)

The L1 column will always be the last one. Knowing that, getting L1_index by ncol(x_coordinates) is slightly faster than which(colnames(x_coordinates) == "L1"). But hey, what are we talking about here.. Nanoseconds 😆

:thumbsup:

I changed the name of the issue, now we know where the problem occurs. Also, I updated develop to be equal with master again. Can I assign you for this @agila5 ? And would it be possible to implement before our next milestone on June 30th? - if not, I can implement your function as well -

Yes and yes. And great work!

luukvdmeer commented 4 years ago

With PR https://github.com/luukvdmeer/sfnetworks/pull/63 this is now fixed in develop. As an additional goodie, constructing the network from roxel now got down to more or less 46 milliseconds on my laptop! Thanks @agila5

Robinlovelace commented 4 years ago

Fantastic work guys, I can give this a spin. Amazing to hear of the speed-ups also.

agila5 commented 4 years ago

With PR #63 this is now fixed in develop. As an additional goodie, constructing the network from roxel now got down to more or less 46 milliseconds on my laptop! Thanks @agila5

Awesome, you're welcome 👍

Robinlovelace commented 4 years ago

Here are some more tests on a similar dataset. This one does not require a MapBox API key, that was just for testing the Z dimension, which seems to work. Some observations from the reproducible example below:

remotes::install_github("itsleeds/geofabrik", quiet = TRUE)
remotes::install_github("luukvdmeer/sfnetworks", ref = "develop", quiet = TRUE)

library(geofabrik)
library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 7.0.0
library(sfnetworks)

# download geofabrik data
city_name = "sheffield"
buffer_size = 2000
city_sf = tmaptools::geocode_OSM(city_name, as.sf = TRUE)
city_buffer = stplanr::geo_buffer(city_sf, dist = buffer_size)

osm_data_city = get_geofabrik(name = city_sf)
#> although coordinates are longitude/latitude, st_contains assumes that they are planar
#> The place is within these geofabrik zones: Europe, Great Britain, Britain and Ireland, England, South Yorkshire
#> Selecting the smallest: South Yorkshire
#> Data already detected in ~/hd/data/osm/South Yorkshire.osm.pbf
#> Old attributes: attributes=name,highway,waterway,aerialway,barrier,man_made
#> New attributes: attributes=name,highway,waterway,aerialway,barrier,man_made,maxspeed,oneway,building,surface,landuse,natural,start_date,wall,service,lanes,layer,tracktype,bridge,foot,bicycle,lit,railway,footway
#> Using ini file that can can be edited with file.edit(/tmp/RtmpS6Hzi5/ini_new.ini)
osm_data_highway = osm_data_city[!is.na(osm_data_city$highway), ]
osm_data_buffer = osm_data_highway[city_buffer, ]
#> although coordinates are longitude/latitude, st_intersects assumes that they are planar
mapview::mapview(st_as_sf(net %>% activate("edges")))
#> Error in eval(lhs, parent, parent): object 'net' not found

net = sfnetworks::as_sfnetwork(osm_data_buffer)
p1 = tmaptools::geocode_OSM("sheffield rail station", as.sf = TRUE)
#> No results found for "sheffield rail station".
p2 = tmaptools::geocode_OSM("west street sheffield", as.sf = TRUE)
route = net %>%  
  activate("edges") %>%  
  tidygraph::convert(to_spatial_shortest_paths, p1, p2)
#> although coordinates are longitude/latitude, st_nearest_feature assumes that they are planar
mapview::mapview(sf::st_as_sf(route)) +
  mapview::mapview(p1) +
  mapview::mapview(p2)

p1 # it calculates a route when p1 is NULL!
#> NULL

p1 = tmaptools::geocode_OSM("sheffield tap", as.sf = TRUE)
route = net %>%  
  activate("edges") %>%  
  tidygraph::convert(to_spatial_shortest_paths, p1, p2)
#> although coordinates are longitude/latitude, st_nearest_feature assumes that they are planar
#> although coordinates are longitude/latitude, st_nearest_feature assumes that they are planar
mapview::mapview(sf::st_as_sf(route)) +
  mapview::mapview(p1) +
  mapview::mapview(p2)


net = as_sfnetwork(osm_data_buffer, directed = FALSE)
route = net %>%  
  activate("edges") %>%  
  tidygraph::convert(to_spatial_shortest_paths, p1, p2)
#> although coordinates are longitude/latitude, st_nearest_feature assumes that they are planar
#> although coordinates are longitude/latitude, st_nearest_feature assumes that they are planar
mapview::mapview(sf::st_as_sf(route)) +
  mapview::mapview(p1) +
  mapview::mapview(p2)

Created on 2020-06-22 by the reprex package (v0.3.0)

luukvdmeer commented 4 years ago

Fixed in 0.3.1

Robinlovelace commented 4 years ago

For sure? I think there should be a test or at least an example to show the fix.

luukvdmeer commented 4 years ago

Sorry to be unclear. This is the fix from @agila5 that is showcased and tested above, but was until now only present in develop. The only thing that changed now is that it is merged to master (which allows to close the issue)

Robinlovelace commented 4 years ago

Great. Worth showcasing with @examples in the documentation maybe. But yes certainly is fixed and see that from the example above.