r-spatial / classInt

Choose Univariate Class Intervals
https://r-spatial.github.io/classInt/
33 stars 8 forks source link

Feature Request: logLik method? #6

Closed billdenney closed 5 years ago

billdenney commented 5 years ago

Could you please include a logLik() method for classIntervals objects? I took a first shot at one below, but it's not clear how to include a standard deviation when there is only one value in a group (or zero values in a group as can sometimes happen). I'm sure that I'm missing more corner cases, and it wouldn't surprise me if it were more efficient to generate the likelihood immediately after the calculation (before returning the original object):

logLik.classIntervals <- function(object, ...) {
  df <- length(object$brks) - 1
  current_loglik <- 0
  for (idx in seq_len(df)) {
    mask_current <-
      if (((idx == 1) & (attr(object, "intervalClosure") == "right")) |
          ((idx == df) & (attr(object, "intervalClosure") == "left"))) {
        object$brks[idx] <= object$var &
          object$var <= object$brks[idx + 1]
      } else if (attr(object, "intervalClosure") == "right") {
        object$brks[idx] < object$var &
          object$var <= object$brks[idx + 1]
      } else if (attr(object, "intervalClosure") == "left") {
        object$brks[idx] <= object$var &
          object$var < object$brks[idx + 1]
      }
    if (sum(mask_current)) {
      current_x <- object$var[mask_current]
      current_loglik <-
        current_loglik +
        if (length(unique(current_x)) == 1) {
          # Assume that the density is 1 at the unique value's location and zero
          # elsewhere.  Therefore the log-density is 0.
          0
        } else {
          sum(dnorm(x=current_x, mean=mean(current_x), sd=sd(current_x), log=TRUE))
        }
    }
  }
  structure(current_loglik, df=df, nobs=attr(object, "nobs"), class="logLik")
}
billdenney commented 5 years ago

Related to this, would you consider AIC-based automation of number of breaks as an alternate option to nclass.Sturges()?

rsbivand commented 5 years ago

Could you please motivate this with references to relevant literature (especially the thematic cartography literature)? It would also be very useful to see a use case based on the cited literature, where your proposals show how they implement approaches described in the references (and hopefully replicate them). Once we get that far, a PR would be needed, including the bits & pieces needed to ensure that the package after changes continues to pass R CMD check.

billdenney commented 5 years ago

I don't know thematic cartography literature, but there are references in statistical literature (section 2.3 in the review at http://www.numdam.org/article/PS_2009__13__181_0.pdf refers to several source papers, http://www.numdam.org/article/PS_2006__10__24_0.pdf being the most relevant to the statistics).

My rationale for needing the bins is related to the problem described in https://hal.archives-ouvertes.fr/hal-00642621/document where we are trying to bin data from clinical studies where sampling times are inconsistent and bins are useful to aid comparison between model-simulated results and observed results.

rsbivand commented 5 years ago

I can see at least some of your point, but this isn't a Pull Request. I don't have time to research and document this myself, but am receptive to a complete PR, with all necessary edits made, and fully reproducible examples from the literature, including the corner cases you mention. I have added a "dpih" style for Wand's plug-in automatic n approach in the github repo.

billdenney commented 5 years ago

Thanks for the feedback. I recognize that it's not a PR, but I wanted to confirm interest prior to completing the work for a PR. Now that I know there is interest, I'll work on the PR for it (it may take a bit of time, but it will come).

rsbivand commented 5 years ago

And please use the current code following https://github.com/r-spatial/classInt/commit/ab3e4e8fd63639e926c3393e4c7ba45a508c6b1b, though it shouldn't matter so much (not editing an existing function, adding a method).

rsbivand commented 5 years ago

You may carry on with this logLik() method PR, but please do provide necessary handholds in the documentation for the vast majority of users who are thematic map creators. This means a proper Rd file explaining what it is needed for, which style= arguments it may suit better than others, and examples that test (using try()) edge cases. Six months have passed since you raise this issue, so you do not need to close this off abruptly.

billdenney commented 5 years ago

I closed it because according to your comment on the PR, "I still do not see any motivation for this, no cartographer using the package needs this." And additionally, you didn't appear to like parts of my coding style (notably the use of roxygen) which I find helps me keep the code and documentation more consistent. Those suggested that it was unwelcome from two fronts.

With this response, you've given me specific items to modify which will help make it work within the package. I'll make another attempt, but I initially closed it as I didn't want to waste either of our time with something that was not going to improve the package.

rsbivand commented 5 years ago

I refer to a discussion on R-devel and a brief survey of roxygen leading to man pages that split into two groups, one group still in the S/R tradition of terse but dense man pages expounding the function or method documented, together with examples very often using the data sets shown in the references, and a second, larger, group using roxygen to fulfill the need to document only symbolically (I can look up the URL if need be; of course, I don't use roxygen because it has only been available since 2011 in some form, and this package predates it). You did provide references, but it would be very helpful to show that the method implements an example from a reference, and maybe why (using AIC()?). This particularly because it may very well be the case that thematic mappers might also grasp the possiblility of comparing bin designs.

billdenney commented 5 years ago

I'm perfectly happy to do my best to comply with your preferred coding style as it's your package. :)

I guess I fall into a third camp: I try to document in detail, but I prefer the simplification and standardization that roxygen provides. But again, your package gets your preferences.

I've updated with more details, and hopefully this is getting closer to your preferred design.

rsbivand commented 5 years ago

Fine, thanks. This is the analysis (weakly) supporting Duncan Murdoch's speculation. I'll comment in PR https://github.com/r-spatial/classInt/pull/12.

billdenney commented 5 years ago

Thanks for the pointer to the article.

billdenney commented 5 years ago

Thanks!