nptscot / rnetmatch

Match the features in 2 route networks enabling joins
MIT License
5 stars 1 forks source link

Draft `rnet_aggregate()` #26

Closed JosiahParry closed 9 months ago

JosiahParry commented 9 months ago

This PR begins to address #22. It creates a user interface to doing weighted aggregation of numeric variables. It uses dplyr which I am comfortable with this for development.

Please review the draft. This performs a shared-length weighted aggregation of variables in y. It does not join the results back onto x. Doing so would require copying the data which can be expensive and we don't know how big x might be.

library(sf)
#> Warning: package 'sf' was built under R version 4.3.1
#> Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
library(rnetmatch)
x <- read_sf("data-raw/geojson/intersection_example_simple.geojson") |>
  sf::st_transform(27700)
y <- read_sf("data-raw/geojson/intersection_example_complex.geojson") |>
  sf::st_transform(27700)

matches <- rnetmatch::rnet_match(x, y, 10, 5) 

rnet_aggregate(
  x, y, matches,
  # columns in j
  all_fastest_bicycle,
  Quietness,
  commute_quietest_bicycle_go_dutch
)
#> # A tibble: 10 × 4
#>        i all_fastest_bicycle Quietness commute_quietest_bicycle_go_dutch
#>    <int>               [1/m]     [1/m]                             [1/m]
#>  1     1                 0        90.0                             90.0 
#>  2     2                 0       100.                              90.0 
#>  3     3               153.       95.0                           2004.  
#>  4     4               153.       95.0                           2094.  
#>  5     5               219.      335.                             344.  
#>  6     6               134.      213.                              29.8 
#>  7     7                58.5      40.0                             84.7 
#>  8     8                65.7      48.0                            129.  
#>  9     9                12.0      27.4                             55.3 
#> 10    10                25.3      22.6                              4.91

As an aside, that I'm strongly opposed to shipping software that imports dplyr due to the frequency of breaking changes as well as the performance of it. We can address later once we've figured it out.

Robinlovelace commented 9 months ago

Finally got time at a computer today. Will check it out (with gh)

gh pr checkout 26
Robinlovelace commented 9 months ago

As an aside, that I'm strongly opposed to shipping software that imports dplyr due to the frequency of breaking changes as well as the performance of it. We can address later once we've figured it out.

Fair, I've been burned more than once by this. I think {dplyr} is finally stable (although you never know), but my main reason for not wanting to import it is its size and number of additional deps.

Robinlovelace commented 9 months ago

Results from example I'm looking at, looking good:

library(sf)
x = read_sf("data-raw/geojson/princes_street_minimal_x_1.geojson") |>
  sf::st_transform(27700)
y = read_sf("data-raw/geojson/princes_street_minimal.geojson") |>
  sf::st_transform(27700)
matches = rnetmatch::rnet_match(x, y, dist_tolerance = 10, angle_tolerance = 5)
y_aggregated = rnet_aggregate(x, y, matches, value)
y_aggregated$id = y_aggregated$i
y_joined = dplyr::left_join(x, y_aggregated)
plot(y["value"], lwd = 5)
plot(y_joined["value"], lwd = 5)
Robinlovelace commented 9 months ago

Original y

image

And y_joined:

image

Robinlovelace commented 9 months ago

All on the same map:

image

Robinlovelace commented 9 months ago

Only suggestion would be: remove the confusing units.

JosiahParry commented 9 months ago

@Robinlovelace, do you think non-standard eval is the way to go here via ...? I think it feels pretty okay. The alternative would be a to provide a vector of column names. I'm open to both

Robinlovelace commented 9 months ago

NSE is fine for now, in going fast and breaking stuff mode. Will merge.