njtierney / maxcovr

Tools in R to make it easier to solve the Maximal Coverage Location Problem
http://maxcovr.njtierney.com/
GNU General Public License v3.0
42 stars 11 forks source link

create a coverage function #37

Open njtierney opened 7 years ago

njtierney commented 7 years ago

Currently to get the summary information about coverage one has to do something like:

library(maxcovr)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

# subset to be the places with towers built on them.
york_selected <- york %>% filter(grade == "I")

york_unselected <- york %>% filter(grade != "I")

dat_dist <- york_selected %>% nearest(york_crime)

head(dat_dist)
#> # A tibble: 6 × 22
#>   to_id nearest_id   distance              category
#>   <dbl>      <dbl>      <dbl>                 <chr>
#> 1     1         66  165.85752 anti-social-behaviour
#> 2     2         48 2086.76298 anti-social-behaviour
#> 3     3         55   68.23116 anti-social-behaviour
#> 4     4         11  286.34132 anti-social-behaviour
#> 5     5         25  535.78713 anti-social-behaviour
#> 6     6         20  159.90888 anti-social-behaviour
#> # ... with 18 more variables: persistent_id <chr>, date <chr>,
#> #   lat_to <dbl>, long_to <dbl>, street_id <chr>, street_name <chr>,
#> #   context <chr>, id <chr>, location_type <chr>, location_subtype <chr>,
#> #   outcome_status <chr>, long_nearest <dbl>, lat_nearest <dbl>,
#> #   object_id <int>, desig_id <chr>, pref_ref <int>, name <chr>,
#> #   grade <chr>

dat_dist %>% 
    mutate(is_covered = distance <= 100) %>%
    summarise_coverage()
#> # A tibble: 1 × 7
#>   distance_within n_cov n_not_cov   pct_cov pct_not_cov dist_avg  dist_sd
#>             <dbl> <int>     <int>     <dbl>       <dbl>    <dbl>    <dbl>
#> 1             100   339      1475 0.1868798   0.8131202 1400.192 1596.676

A function like coverage (or something slightly more descriptive) should behave like nearest and return the coverage.

coverage <- function(nearest_df,
                     to_df, 
                     distance_cutoff = 100){

    nearest_df %>% 
        nearest(to_df) %>%
        dplyr::mutate(is_covered = distance <= distance_cutoff) %>%
        summarise_coverage()

}

york_selected %>% coverage(york_crime)
#> # A tibble: 1 × 7
#>   distance_within n_cov n_not_cov   pct_cov pct_not_cov dist_avg  dist_sd
#>             <dbl> <int>     <int>     <dbl>       <dbl>    <dbl>    <dbl>
#> 1             100   339      1475 0.1868798   0.8131202 1400.192 1596.676
york_crime %>% coverage(york_selected)
#> # A tibble: 1 × 7
#>   distance_within n_cov n_not_cov   pct_cov pct_not_cov dist_avg  dist_sd
#>             <dbl> <int>     <int>     <dbl>       <dbl>    <dbl>    <dbl>
#> 1             100    54        17 0.7605634   0.2394366 119.9247 247.2918

I'll probably need to do some refactoring on summarise_coverage() at some point soon. Would be good if it worked properly with group_by.

njtierney commented 7 years ago

Add a print method for this that shows what the results are doing.

Specifically, state something like "Coverage df1 on df2 ", that clearly explains what the method did.