Benchmarking - Githubissues

Robinlovelace commented 4 years ago

For another project I've done some benchmarks and it seems that sfnetworks is already pretty fast. Wonder if we can make it even faster!

# Aim: benchmark the performance of different spatial network packages

library(magrittr)
library(stplanr)
library(sf)
#> Linking to GEOS 3.7.1, GDAL 2.4.2, PROJ 5.2.0
piggyback::pb_download("chapeltown_leeds_key_roads.Rds", repo = "ropensci/stplanr", dest = ".", show_progress = FALSE)
chapeltown_leeds_key_roads <- readRDS("chapeltown_leeds_key_roads.Rds")
x <- chapeltown_leeds_key_roads %>% 
  st_transform(crs = geo_select_aeq(.))
x_sp = as(x, "Spatial")

# spatial network creation ------------------------------------------------

stplanr <- function() stplanr::SpatialLinesNetwork(x)
sfnetworks <- function() sfnetworks::sfn_asnetwork(x)
dodgr <- function() dodgr::weight_streetnet(x)
shp2graph <- function() shp2graph::readshpnw(x_sp)

bench::mark(check = FALSE, stplanr(), sfnetworks(), dodgr(), shp2graph())
#> Warning in SpatialLinesNetwork.sf(x): Graph composed of multiple subgraphs,
#> consider cleaning it with sln_clean_graph().

#> Warning in SpatialLinesNetwork.sf(x): Graph composed of multiple subgraphs,
#> consider cleaning it with sln_clean_graph().

#> Warning in SpatialLinesNetwork.sf(x): Graph composed of multiple subgraphs,
#> consider cleaning it with sln_clean_graph().

#> Warning in SpatialLinesNetwork.sf(x): Graph composed of multiple subgraphs,
#> consider cleaning it with sln_clean_graph().

#> Warning in SpatialLinesNetwork.sf(x): Graph composed of multiple subgraphs,
#> consider cleaning it with sln_clean_graph().
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 4 x 6
#>   expression        min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>   <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 stplanr()     137.2ms 142.87ms     7.03     7.85MB     5.27
#> 2 sfnetworks()  64.69ms  69.22ms    14.3      4.49MB     7.15
#> 3 dodgr()         1.41s    1.41s     0.709   68.29MB     2.84
#> 4 shp2graph()  310.25ms 324.86ms     3.08   473.86MB    27.7

^{Created on 2019-11-29 by the reprex package (v0.3.0)}

luukvdmeer commented 4 years ago

I added this to milestone 3 (our last milestone), such that we can do the benchmarking towards the end of the project when the core of the code is finished, and it is time to finetune.

Is it ok if I assign you for this @Robinlovelace ?

Robinlovelace commented 4 years ago

Sure I'm up for that. Will be good to generate some consistent benchmarks, I'll start by looking for other open network datasets used by other projects for benchmarking.

mvl22 commented 4 years ago

Benchmarking of routing performance will depend entirely what/which you're optimising for:

Network size (city vs continent)
Dynamic data (e.g. live traffic)
Number of routing output types, involving shared data
Transport type (bicycle will involve considering more of the graph than car)
Whether you want the OpenTripPlanner-style triangle to compute preferences dynamically rather than optimise up-front
Speed of returned result
CPU
RAM footprint

OSRM for instance is fast for massive networks but is less optimal once you want rapidly-changing live traffic data, as the ability to do up-front optimisation is lowered.

Robinlovelace commented 4 years ago

One approach to continuous benchmarking is this: https://github.com/r-lib/bench#continuous-benchmarking

Thoughts @agila5, @loreabad6 and @luukvdmeer ? Worth a try I guess but could be overly complex compared with reporting benchmarks in README with each build manually.

Robinlovelace commented 4 years ago

Good news so far: sfnetworks seems to be faster at creating spatial objects, even though the object sizes are larger:

library(sfnetworks)
    system.time({
        net = as_sfnetwork(roxel)
    })
#>    user  system elapsed 
#>   0.062   0.001   0.062
    system.time({
        net2 = stplanr::SpatialLinesNetwork(roxel)
    })
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 7.0.0
#> Warning in SpatialLinesNetwork.sf(roxel): Graph composed of multiple subgraphs,
#> consider cleaning it with sln_clean_graph().
#>    user  system elapsed 
#>   0.859   0.020   0.879
    pryr::object_size(net)
#> Registered S3 method overwritten by 'pryr':
#>   method      from
#>   print.bytes Rcpp
#> 807 kB
    pryr::object_size(net2)
#> 447 kB

    res = bench::press(n = seq(from = 10, to = nrow(roxel), length.out = 5),
                       {
                           bench::mark(
                               check = FALSE,
                               time_unit = "ms",
                               stplanr::SpatialLinesNetwork(roxel[1:n, ]),
                               sfnetworks::as_sfnetwork(roxel[1:n, ])
                           )
                       }
    )
#> Running with:
#>       n
#> 1   10

    ggplot2::autoplot(res)

^{Created on 2020-06-22 by the reprex package (v0.3.0)}

Session info

``` r devtools::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 3.6.3 (2020-02-29) #> os Ubuntu 18.04.4 LTS #> system x86_64, linux-gnu #> ui X11 #> language en_GB:en #> collate en_GB.UTF-8 #> ctype en_GB.UTF-8 #> tz Europe/London #> date 2020-06-22 #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date lib source #> assertthat 0.2.1 2019-03-21 [2] CRAN (R 3.6.0) #> backports 1.1.8 2020-06-17 [1] CRAN (R 3.6.3) #> beeswarm 0.2.3 2016-04-25 [1] CRAN (R 3.6.1) #> bench 1.1.1 2020-01-13 [2] CRAN (R 3.6.2) #> callr 3.4.3 2020-03-28 [1] CRAN (R 3.6.3) #> class 7.3-17 2020-04-26 [2] CRAN (R 3.6.3) #> classInt 0.4-3 2020-04-06 [1] Github (r-spatial/classInt@d024051) #> cli 2.0.2 2020-02-28 [1] CRAN (R 3.6.2) #> codetools 0.2-16 2018-12-24 [4] CRAN (R 3.6.3) #> colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.6.3) #> crayon 1.3.4 2017-09-16 [2] standard (@1.3.4) #> curl 4.3 2019-12-02 [2] CRAN (R 3.6.2) #> DBI 1.1.0 2019-12-15 [2] CRAN (R 3.6.2) #> desc 1.2.0 2018-05-01 [2] standard (@1.2.0) #> devtools 2.3.0 2020-04-10 [1] CRAN (R 3.6.3) #> digest 0.6.25 2020-02-23 [1] CRAN (R 3.6.2) #> dplyr 1.0.0.9000 2020-06-16 [1] Github (tidyverse/dplyr@fd08fe9) #> e1071 1.7-3 2019-11-26 [2] CRAN (R 3.6.1) #> ellipsis 0.3.1 2020-05-15 [3] CRAN (R 3.6.3) #> evaluate 0.14 2019-05-28 [2] CRAN (R 3.6.0) #> fansi 0.4.1 2020-01-08 [1] CRAN (R 3.6.2) #> farver 2.0.3 2020-01-16 [1] CRAN (R 3.6.2) #> foreign 0.8-76 2020-03-03 [2] CRAN (R 3.6.2) #> fs 1.4.1 2020-04-04 [2] CRAN (R 3.6.3) #> generics 0.0.2 2018-11-29 [3] CRAN (R 3.5.1) #> geosphere 1.5-10 2019-05-26 [2] CRAN (R 3.6.0) #> ggbeeswarm 0.6.0 2017-08-07 [1] CRAN (R 3.6.1) #> ggplot2 3.3.2 2020-06-19 [1] CRAN (R 3.6.3) #> glue 1.4.1 2020-05-13 [2] CRAN (R 3.6.3) #> gtable 0.3.0 2019-03-25 [3] CRAN (R 3.5.3) #> highr 0.8 2019-03-20 [3] CRAN (R 3.5.3) #> htmltools 0.5.0.9000 2020-06-18 [1] Github (rstudio/htmltools@a8025f3) #> httr 1.4.1 2019-08-05 [2] CRAN (R 3.6.1) #> igraph 1.2.5 2020-03-19 [1] CRAN (R 3.6.3) #> KernSmooth 2.23-17 2020-04-26 [4] CRAN (R 3.6.3) #> knitr 1.28 2020-02-06 [1] CRAN (R 3.6.2) #> lattice 0.20-41 2020-04-02 [2] CRAN (R 3.6.3) #> lifecycle 0.2.0.9000 2020-03-16 [1] Github (r-lib/lifecycle@355dcba) #> lwgeom 0.2-5 2020-06-12 [1] CRAN (R 3.6.3) #> magrittr 1.5 2014-11-22 [2] CRAN (R 3.5.2) #> maptools 1.0-1 2020-05-14 [1] CRAN (R 3.6.3) #> memoise 1.1.0 2017-04-21 [3] CRAN (R 3.5.0) #> mime 0.9 2020-02-04 [1] CRAN (R 3.6.2) #> munsell 0.5.0 2018-06-12 [3] CRAN (R 3.5.0) #> pillar 1.4.4 2020-05-05 [1] CRAN (R 3.6.3) #> pkgbuild 1.0.8 2020-05-07 [1] CRAN (R 3.6.3) #> pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 3.6.1) #> pkgload 1.1.0 2020-05-29 [3] CRAN (R 3.6.3) #> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 3.6.2) #> processx 3.4.2 2020-02-09 [1] CRAN (R 3.6.3) #> profmem 0.5.0 2018-01-30 [2] CRAN (R 3.5.2) #> pryr 0.1.4 2018-02-18 [1] CRAN (R 3.6.1) #> ps 1.3.3 2020-05-08 [1] CRAN (R 3.6.3) #> purrr 0.3.4 2020-04-17 [1] CRAN (R 3.6.3) #> R6 2.4.1 2019-11-12 [2] CRAN (R 3.6.1) #> raster 3.3-3 2020-06-18 [1] Github (rspatial/raster@d63b497) #> Rcpp 1.0.4.6 2020-04-09 [1] CRAN (R 3.6.3) #> remotes 2.1.1 2020-02-15 [1] CRAN (R 3.6.2) #> rgeos 0.5-3 2020-05-08 [1] CRAN (R 3.6.3) #> rlang 0.4.6.9000 2020-06-22 [1] Github (r-lib/rlang@64df8e3) #> rmarkdown 2.3 2020-06-18 [1] CRAN (R 3.6.3) #> rprojroot 1.3-2 2018-01-03 [2] CRAN (R 3.5.3) #> scales 1.1.1 2020-05-11 [1] CRAN (R 3.6.3) #> sessioninfo 1.1.1 2018-11-05 [3] CRAN (R 3.5.1) #> sf * 0.9-4 2020-06-22 [1] Github (r-spatial/sf@0b08ed5) #> sfnetworks * 0.3.0 2020-06-22 [1] Github (luukvdmeer/sfnetworks@7baa168) #> sp 1.4-2 2020-05-20 [1] CRAN (R 3.6.3) #> stplanr 0.6.0 2020-06-01 [1] local #> stringi 1.4.6 2020-02-17 [1] CRAN (R 3.6.2) #> stringr 1.4.0 2019-02-10 [2] standard (@1.4.0) #> testthat 2.3.2 2020-03-02 [1] CRAN (R 3.6.3) #> tibble 3.0.1 2020-04-20 [1] CRAN (R 3.6.3) #> tidygraph 1.2.0 2020-05-12 [2] CRAN (R 3.6.3) #> tidyr 1.1.0 2020-05-20 [3] CRAN (R 3.6.3) #> tidyselect 1.1.0 2020-05-11 [1] CRAN (R 3.6.3) #> units 0.6-7 2020-06-13 [1] CRAN (R 3.6.3) #> usethis 1.6.1 2020-04-29 [1] CRAN (R 3.6.3) #> utf8 1.1.4 2018-05-24 [2] CRAN (R 3.5.3) #> vctrs 0.3.1 2020-06-05 [1] CRAN (R 3.6.3) #> vipor 0.4.5 2017-03-22 [1] CRAN (R 3.6.1) #> withr 2.2.0 2020-04-20 [2] CRAN (R 3.6.3) #> xfun 0.15 2020-06-21 [1] CRAN (R 3.6.3) #> xml2 1.3.2 2020-04-23 [3] CRAN (R 3.6.3) #> yaml 2.2.1 2020-02-01 [1] CRAN (R 3.6.2) #> #> [1] /home/robin/R/x86_64-pc-linux-gnu-library/3.6 #> [2] /usr/local/lib/R/site-library #> [3] /usr/lib/R/site-library #> [4] /usr/lib/R/library ```

Robinlovelace commented 4 years ago

Heads-up, I've added continuous benchmarking in #64 but the build is failing due to credentials issues. That should be an easy fix. See here for details: https://github.com/r-lib/bench/issues/87

Any ideas of what else we should benchmark?

agila5 commented 4 years ago

Hi and thanks for your work! IMO, for the moment, it's good enough since I think we should focus on testing the current functionalities, fix the bugs and then optimize the code and benchmark different implementations considering also what @mvl22 said. Let's keep this issue open for the time being.

Did you understand why the build is failing? Sorry but I have literally 0 experience with Github Actions and benchmarks.

Robinlovelace commented 4 years ago

Did you understand why the build is failing?

No, I'm not sure why the benchmarks are failing. One consideration: wonder if it's worth adding an optional edge_lengths parameter in as_sfnetworks() which could be FALSE by default.

agila5 commented 4 years ago

One consideration: wonder if it's worth adding an optional edge_lengths parameter in as_sfnetworks() which could be FALSE by default.

IMO yes if the network is created with explicit edges since I've always used the edge lengths during the analysis after creating the network.

luukvdmeer commented 3 years ago

Heads-up @Robinlovelace . Since lately the continuous benchmarking is failing. Whenever you find the time could you take a look? For me it is a mystery ;-)

Robinlovelace commented 3 years ago

Hi @luukvdmeer yes will do. Do we want to benchmark any other things?

Robinlovelace commented 3 years ago

Seems it has benchmarked things historically:

setwd("~/wip/sfnetworks/")
bench::cb_fetch()
d = bench::cb_read()
bench::cb_plot_time(d)
#> Loading required namespace: ggplot2
#> Loading required namespace: tidyr

^{Created on 2020-11-05 by the reprex package (v0.3.0)}

Robinlovelace commented 3 years ago

Just tried this locally and it worked with no errors:

bench::cb_run()

Robinlovelace commented 3 years ago

Not 100% sure how it works either. I have checked here https://github.com/r-lib/bench/actions?query=workflow%3A%22Continuous+Benchmarks%22 and cannot see build logs there either. The examples above show how to read benchmarks saved in the past, would be useful to have a date.

TBH I do not fully understand continuous benchmarking. We could add simple benchmarks to a vignette instead. Thoughts @luukvdmeer ?

I have advocated for better documentation on the 'CB' approach in https://github.com/r-lib/bench/issues/87 but while we're waiting for that we could change tack.

luukvdmeer commented 3 years ago

I agree. Lets for now disable it until it matures. I saved the bench setup as we had it in a branch named bench. We can re-add that content later.

Robinlovelace commented 3 years ago

Great thinking. I can add something on benchmarking using system.time() - which vignette though?

luukvdmeer commented 3 years ago

How about starting with some basic benchmarking (the currently existing ones) in a "Benchmarks" section in the README? Once we have more coverage of other functionalities (or if you already have them) we could dedicate a new vignette to it, focused only on benchmarking.

luukvdmeer / sfnetworks

Benchmarking #6