ropensci / stplanr

Sustainable transport planning with R
https://docs.ropensci.org/stplanr
Other
419 stars 66 forks source link

Use `od::odc_to_sfc()` do the legwork in `mats2line()` #544

Closed Robinlovelace closed 1 year ago

Robinlovelace commented 1 year ago

As outlined here https://github.com/ropensci/stplanr/blob/master/data-raw/test-mats2line.md there is already a computationally more efficient implementation of mats2line() in the {od} package:

bench::mark(
  mats2line_old(m1, m2),
  mats2line(m1, m2),
  od::odc_to_sfc(cbind(m1, m2))
)
## Warning: Some expressions had a GC in every iteration; so filtering is
## disabled.

## # A tibble: 3 × 6
##   expression                         min   median `itr/sec` mem_alloc `gc/sec`
##   <bch:expr>                    <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
## 1 mats2line_old(m1, m2)          217.3ms  218.5ms      4.55  518.21KB    15.2 
## 2 mats2line(m1, m2)               14.4ms   16.5ms     48.9     4.01MB    15.7 
## 3 od::odc_to_sfc(cbind(m1, m2))   16.2ms   18.2ms     47.6     5.01MB     7.93

Given the advantages of modularity, and that the purpose of the {od} package is to work with origin-destination data, it makes sense to use the {od} implementation.

Motivation: #539

@JosiahParry does that sound reasonable, to basically replace that function with od::odc_to_sfc(cbind(m1, m2)), keeping some of the checks?

And can you spot any speed-up opportunities in odc_to_sfc() ?

Quick glance at the source code I wonder if this could be made any faster, e.g. by using vctrs?

od_coordinates_ids = function(odc) {
  res = data.frame(id = rep(1:nrow(odc), each = 2), x = NA, y = NA)
  ids_odd = seq(1, nrow(res), by = 2)
  ids_even = ids_odd + 1
  res[ids_odd, c("x", "y")] = odc[, 1:2]
  res[ids_even, c("x", "y")] = odc[, 3:4]
  res
}
JosiahParry commented 1 year ago

Yeah, checks out. Particularly since its already imported!

Robinlovelace commented 1 year ago

And I've just checked we can make the future upstream implementation faster thanks to your {vctrs} knowledge.

Robinlovelace commented 1 year ago

The {od} implementation uses the following:

od_coordinates_ids = function(odc) {
  res = data.frame(id = rep(1:nrow(odc), each = 2), x = NA, y = NA)
  ids_odd = seq(1, nrow(res), by = 2)
  ids_even = ids_odd + 1
  res[ids_odd, c("x", "y")] = odc[, 1:2]
  res[ids_even, c("x", "y")] = odc[, 3:4]
  res
}
od_coordinates_ids(odc)
##   id           x          y
## 1  1  0.54634963 -1.9157423
## 2  1  0.38803029  0.9626673
## 3  2 -0.75469220  0.4153730
## 4  2  0.02065935  0.3727175
## 5  3 -1.03140234 -0.1922811
## 6  3  0.16871653  0.7634995

Let’s see if we can speed that up:

od_coordinates_ids2 = function(odc) {
  res = vctrs::vec_interleave(odc[, 1:2], odc[, 3:4])
  res = data.frame(id = rep(1:nrow(odc), each = 2), x = res[, 1], y = res[, 2])
  res
}
waldo::compare(
  od_coordinates_ids(odc),
  od_coordinates_ids2(odc)
)
## ✔ No differences
odc = cbind(m1, m2)
bench::mark(
  od = od_coordinates_ids(odc),
  vctrs = od_coordinates_ids2(odc)
)
## # A tibble: 2 × 6
##   expression      min   median `itr/sec` mem_alloc `gc/sec`
##   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
## 1 od           2.75ms   2.84ms      343.    3.94MB     33.9
## 2 vctrs      876.61µs 902.94µs     1092.    1.74MB     50.8