ropensci / stplanr

Sustainable transport planning with R
https://docs.ropensci.org/stplanr
Other
416 stars 66 forks source link

522 speed up rnet merge #550

Closed Robinlovelace closed 9 months ago

Robinlovelace commented 9 months ago
Robinlovelace commented 9 months ago

Please review @JosiahParry if you get a chance. Will do a benchmark...

Robinlovelace commented 9 months ago

OK: now I realise why I thought the vectorised version failed, there's evidence of it not working.

devtools::check()
> rnetj2 = rnet_join(osm_net_example, route_network_small, dist = 9, segment_length = 10)
   Warning: attribute variables are assumed to be spatially constant throughout all geometries
   Warning in st_cast.sf(sf::st_cast(x, "MULTILINESTRING"), "LINESTRING") :
     repeating attributes for all sub-geometries for which they may not be constant
   Joining with `by = join_by(osm_id)`
   Error in res_tbl[[attr(l, "sf_column")]] <- res : 
     more elements supplied than there are to replace
   Calls: rnet_join ... line_segment -> line_segment.sf -> line_segment_rsgeo

Can you reproduce this and possibly jump in to finish this PR please @JosiahParry I'm off to bed z z

Robinlovelace commented 9 months ago

It's now vectorised but doesn't seem to be any faster for the test example.

Robinlovelace commented 9 months ago

The rsgeo bit is ~3x faster, not sure why the function still seems slow:

system.time({res <- line_segment_rsgeo(l, n_segments = n_segments)})
debug at #1: res <- line_segment_rsgeo(l, n_segments = n_segments)
Browse[3]> c
   user  system elapsed 
  0.565   0.037  10.401 
Robinlovelace commented 9 months ago

Reprex to test-out speeds after checking out this branch:

library(stplanr)
rnet_x = sf::read_sf("https://github.com/nptscot/npt/releases/download/v2/rnet_x_thurso.geojson")
rnet_y = sf::read_sf("https://github.com/nptscot/npt/releases/download/v2/rnet_y_thurso.geojson")

name_list = names(rnet_y)
funs = list()

# Loop through each name and assign it a function based on specific conditions
for (name in name_list) {
  if (name == "geometry") {
    next  # Skip the current iteration
  } else if (name %in% c("Gradient", "Quietness")) {
    funs[[name]] = mean
  } else {
    funs[[name]] = sum
  }
}

runtime = system.time({
  rnet_merged = rnet_merge(rnet_x, rnet_y, dist = 20, segment_length = 10, funs = funs, max_angle_diff = 20)
})
nrow(rnet_x) / runtime[3]
Robinlovelace commented 9 months ago

Bottleneck found:

system.time(line_bearing(rnet_y, bidirectional = TRUE))

   user  system elapsed 
156.930   1.093 158.997 
Browse[1]> system.time(line_segment(rnet_y, segment_length = segment_length))
   user  system elapsed 
  0.829   0.004   0.831

Will open an new issue...

Robinlovelace commented 9 months ago

551

Robinlovelace commented 9 months ago

Getting a 100x speed-up now :tada:

Robinlovelace commented 9 months ago
runtime = system.time({
  rnet_merged = rnet_merge(rnet_x, rnet_y, dist = 20, segment_length = 10, funs = funs, max_angle_diff = 20)
})
nrow(rnet_x) / runtime[3]
Joining with `by = join_by(identifier)`
Joining with `by = join_by(identifier)`
Warning messages:
1: attribute variables are assumed to be spatially constant throughout all geometries 
2: In st_cast.sf(sf::st_cast(x, "MULTILINESTRING"), "LINESTRING") :
  repeating attributes for all sub-geometries for which they may not be constant
3: st_centroid assumes attributes are constant over geometries 
 elapsed 
653.0758