ropensci / stplanr

Sustainable transport planning with R
https://docs.ropensci.org/stplanr
Other
416 stars 66 forks source link

Possible speed enhancement to `mats2line()` #539

Closed JosiahParry closed 9 months ago

JosiahParry commented 10 months ago

I wanted to propose a possible speed enhancement for mats2line(). The current function uses lapply() which is great for smaller amounts of data but when the size gets larger, its performance slows down greatly. This is a semi-vectorized approach that is much more performant at higher volumes though I will note that the use of sfheaders increases the memory consumption but as you can see here, for 10k points, its marginal.

library(stplanr)

m1 <- matrix(rnorm(20000), ncol = 2)
m2 <- matrix(runif(20000), ncol = 2)

mats2line2 <- function(m1, m2) {
  m <- vctrs::vec_interleave(m1,m2)
  m <- cbind(m, rep.int(1:nrow(m1), rep(2, nrow(m1))))

  colnames(m) <- c("x", "y", "id")

  sfheaders::sfc_linestring(
    m,
    x = "x",
    y = "y", 
    linestring_id = "id"
  )
}

bench::mark(
  mats2line(m1, m2),
  mats2line2(m1, m2)
)

#> # A tibble: 2 × 6
#>   expression              min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>         <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 mats2line(m1, m2)   74.65ms  84.97ms      10.9   10.54MB     50.8
#> 2 mats2line2(m1, m2)   5.87ms   6.55ms     124.     3.96MB     32.4
JosiahParry commented 10 months ago

One note is that this would require vctrs to become an explicit dependency rather than an implicit one as it presently is (due to recursive import of dplyr etc.)

Robinlovelace commented 10 months ago

Agreed, and big :+1: to this looking at the old code

https://github.com/ropensci/stplanr/blob/9661722564d550c24a62d7abcbb4ea8bff55435b/R/line_via.R#L19-L29

I say old because the function should probably be superseded by a function in the {od} package which {stplanr} already imports

https://github.com/ropensci/stplanr/blob/9661722564d550c24a62d7abcbb4ea8bff55435b/DESCRIPTION#L50

Wondering if the function should move across and maybe call od::matrices_to_od() or whatever the function ends up being called. Thoughts?

See here for the closest thing in that new package, the aim of which was and is still (when time allows) to split out all the OD processing code from {stplanr} which is, truth be told, a bit of a bloated metapackage... So any wider points on general 'where to put things' etc welcome, big refactor of this and related package could be overdue.

Robinlovelace commented 10 months ago

https://itsleeds.github.io/od/reference/coords_to_od.html