luukvdmeer / sfnetworks

Tidy Geospatial Networks in R
https://luukvdmeer.github.io/sfnetworks/
Other
347 stars 20 forks source link

More performant distance calculation when explicitizing edges #180

Closed agila5 closed 2 years ago

agila5 commented 2 years ago

Describe the bug The function st_network_paths may return an error when running it on a large unweighted sfnetwork object with spatially implicit edges. See the reprex below. I think that the problem is that here: https://github.com/luukvdmeer/sfnetworks/blob/112d33ff7ba8e4482aa8efa416062167ea069462/R/edge.R#L112-L120 we should set by_element = TRUE and remove the diag call.

Reproducible example

# packages
library(tidygraph)
library(sfnetworks)

# simulate data
my_graph <- play_geometry(n = 20000, radius = 0.02)
my_sfn <- as_sfnetwork(my_graph, coords = c("x", "y"))
#> Checking if spatial network structure is valid...
#> Spatial network structure is valid
igraph::ecount(my_sfn)
#> [1] 247799

# shortest path
st_network_paths(my_sfn, 1, 2)
#> Error: cannot allocate vector of size 457.5 Gb

Created on 2021-11-17 by the reprex package (v2.0.1)

Expected behavior No error.

R Session Info

Session info ``` r sessioninfo::session_info() #> - Session info -------------------------------------------------------------- #> hash: man health worker, no one under eighteen, woman firefighter: medium skin tone #> #> setting value #> version R version 4.1.1 (2021-08-10) #> os Windows 10 x64 (build 19043) #> system x86_64, mingw32 #> ui RTerm #> language (EN) #> collate Italian_Italy.1252 #> ctype Italian_Italy.1252 #> tz Europe/Berlin #> date 2021-11-17 #> pandoc 2.11.4 @ C:/Program Files/RStudio/bin/pandoc/ (via rmarkdown) #> #> - Packages ------------------------------------------------------------------- #> package * version date (UTC) lib source #> abind 1.4-5 2016-07-21 [1] CRAN (R 4.1.1) #> backports 1.3.0 2021-10-27 [1] CRAN (R 4.1.1) #> class 7.3-19 2021-05-03 [1] CRAN (R 4.1.1) #> classInt 0.4-3 2020-04-07 [1] CRAN (R 4.1.1) #> cli 3.1.0 2021-10-27 [1] CRAN (R 4.1.1) #> colorspace 2.0-2 2021-06-24 [1] CRAN (R 4.1.1) #> crayon 1.4.2 2021-10-29 [1] CRAN (R 4.1.1) #> DBI 1.1.1 2021-01-15 [1] CRAN (R 4.1.1) #> deldir 1.0-6 2021-10-23 [1] CRAN (R 4.1.1) #> digest 0.6.28 2021-09-23 [1] CRAN (R 4.1.1) #> dplyr 1.0.7 2021-06-18 [1] CRAN (R 4.1.1) #> e1071 1.7-9 2021-09-16 [1] CRAN (R 4.1.1) #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.1) #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.1.1) #> fansi 0.5.0 2021-05-25 [1] CRAN (R 4.1.1) #> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.1) #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.1.1) #> generics 0.1.1 2021-10-25 [1] CRAN (R 4.1.1) #> ggplot2 3.3.5 2021-06-25 [1] CRAN (R 4.1.1) #> glue 1.5.0 2021-11-07 [1] CRAN (R 4.1.1) #> goftest 1.2-3 2021-10-07 [1] CRAN (R 4.1.1) #> gtable 0.3.0 2019-03-25 [1] CRAN (R 4.1.1) #> highr 0.9 2021-04-16 [1] CRAN (R 4.1.1) #> htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.1) #> igraph 1.2.7 2021-10-15 [1] CRAN (R 4.1.1) #> KernSmooth 2.23-20 2021-05-03 [1] CRAN (R 4.1.1) #> knitr 1.36 2021-09-29 [1] CRAN (R 4.1.1) #> lattice 0.20-44 2021-05-02 [1] CRAN (R 4.1.1) #> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.1) #> lwgeom 0.2-8 2021-10-06 [1] CRAN (R 4.1.1) #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.1.1) #> Matrix 1.3-4 2021-06-01 [1] CRAN (R 4.1.1) #> mgcv 1.8-36 2021-06-01 [1] CRAN (R 4.1.1) #> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.1.1) #> nlme 3.1-153 2021-09-07 [1] CRAN (R 4.1.1) #> pillar 1.6.4 2021-10-18 [1] CRAN (R 4.1.1) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.1) #> polyclip 1.10-0 2019-03-14 [1] CRAN (R 4.1.1) #> proxy 0.4-26 2021-06-07 [1] CRAN (R 4.1.1) #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.1) #> R.cache 0.15.0 2021-04-30 [1] CRAN (R 4.1.1) #> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.1.1) #> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.1.1) #> R.utils 2.11.0 2021-09-26 [1] CRAN (R 4.1.1) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.1) #> Rcpp 1.0.7 2021-07-07 [1] CRAN (R 4.1.1) #> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.1.1) #> rlang 0.4.12 2021-10-18 [1] CRAN (R 4.1.1) #> rmarkdown 2.11 2021-09-14 [1] CRAN (R 4.1.1) #> rpart 4.1-15 2019-04-12 [1] CRAN (R 4.1.1) #> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.1) #> scales 1.1.1 2020-05-11 [1] CRAN (R 4.1.1) #> sessioninfo 1.2.1 2021-11-02 [1] CRAN (R 4.1.1) #> sf 1.0-3 2021-10-07 [1] CRAN (R 4.1.1) #> sfheaders 0.4.0 2020-12-01 [1] CRAN (R 4.1.1) #> sfnetworks * 0.5.2 2021-05-13 [1] CRAN (R 4.1.1) #> spatstat 2.2-0 2021-06-23 [1] CRAN (R 4.1.1) #> spatstat.core 2.3-0 2021-07-16 [1] CRAN (R 4.1.1) #> spatstat.data 2.1-0 2021-03-21 [1] CRAN (R 4.1.1) #> spatstat.geom 2.3-0 2021-10-09 [1] CRAN (R 4.1.1) #> spatstat.linnet 2.3-0 2021-07-17 [1] CRAN (R 4.1.1) #> spatstat.sparse 2.0-0 2021-03-16 [1] CRAN (R 4.1.1) #> spatstat.utils 2.2-0 2021-06-14 [1] CRAN (R 4.1.1) #> stringi 1.7.5 2021-10-04 [1] CRAN (R 4.1.1) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.1) #> styler 1.6.2 2021-09-23 [1] CRAN (R 4.1.1) #> tensor 1.5 2012-05-05 [1] CRAN (R 4.1.1) #> tibble 3.1.5 2021-09-30 [1] CRAN (R 4.1.1) #> tidygraph * 1.2.0 2020-05-12 [1] CRAN (R 4.1.1) #> tidyr 1.1.4 2021-09-27 [1] CRAN (R 4.1.1) #> tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.1.1) #> units 0.7-2 2021-06-08 [1] CRAN (R 4.1.1) #> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.1) #> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.1) #> withr 2.4.2 2021-04-18 [1] CRAN (R 4.1.1) #> xfun 0.27 2021-10-18 [1] CRAN (R 4.1.1) #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.1.1) #> #> [1] C:/Program Files/R/R-4.1.1/library #> #> ------------------------------------------------------------------------------ ```
luukvdmeer commented 2 years ago

Interesting. I know the distance calculating is quite heavy indeed. But from what I remember I first had it implemented with the by_element = TRUE, but found that surprisingly this is much slower than just calculating the full matrix. This was already a while ago though so maybe it changed or I remember wrong.

Another option to look into could be using geodist. This would require us to first transform the nodes to EPSG:4326 before calculating the distances, and that will probably lead to some accuracy loss, so the best option then would be to let the user explicitly request it through e.g. use_geodist = TRUE. This would also allow to only include geodist as optional rather than required dependency. But we will need to do some benchmarks to check how much we gain from it.

luukvdmeer commented 2 years ago

You are right @agila5 , setting by_element = T is much faster. Changed that now in v0.6.0