njtierney / maxcovr

Tools in R to make it easier to solve the Maximal Coverage Location Problem
http://maxcovr.njtierney.com/
GNU General Public License v3.0
42 stars 11 forks source link

Example consistency #66

Closed carbonmetrics closed 6 years ago

carbonmetrics commented 6 years ago

Not sure whether this is an issue, but in your example for max_coverage, you first call the function with user = york_crime , but later in the preparation for the plot with user = york. That was confusing for me while following your example, since I don't use purrr.

BTW, the plot is much easier and cleaner to get than via the purrr route:

vec = c(0, 20,40,60,80,100)

library(foreach)

res = foreach(i = 1:length(vec), .combine = rbind) %do% {

  l = max_coverage(york_selected,
                     york_unselected,
                     york,
                     n_added = vec[i],
                     distance_cutoff = 100)

  o = l$model_coverage[[1]]
  n.added = o[, "n_added"]
  pct.cov = o[, "pct_cov"]
  df = data.frame(n.added, pct.cov)

}

ggplot(res, aes(n_added, pct_cov)) + geom_line() + geom_point() + theme_minimal()

with the added bonus of just replacing %do% with %dopar% and registering a parallel backend will give you parallel processing and a much better performance.

njtierney commented 6 years ago

Hello,

Thank you for submitting an issue, and for your comments,

Not sure whether this is an issue, but in your example for max_coverage, you first call the function with user = york_crime , but later in the preparation for the plot with user = york.

I think I found the part you were referring to in the README here - I have updated this now - let me know if this isn't where you were referring to?

Future versions of maxcovr will have a better vectorized approach for n_added - see #19

I like the idea of using foreach to speed this up! Thanks for that, I will try and incorporate something similar when I get to #19

Ultimately, I don't want the user to need to worry about writing foreach or purrr code to fit the model many times, there should be vectorized results.

Another thing to consider when creating these outputs are that the user should have good control over the output and summaries - this means working out a good way to present the results. I am still working on how to best do this, but I think that at the moment there is good utility in presenting results in a data.frame, where each row is a model