API for extracting convergence information

tpapp commented 9 years ago

It would be great if rstan had a standardized API for extracting convergence and sample size information from stanfit objects. The functionality is there, but user implementations are necessarily a bit ad hoc (see functions below) and use undocumented internals.

The following are toy implementations (I am not an S4 wizard and I know very little about rstan internals, just to demonstrate what I would like to have and how I am doing it currently):

Rhat <- function(fit) {
  summary(fit)$summary[,"Rhat"]
}

n_eff <- function(fit) {
  summary(fit)$summary[,"n_eff"]
}

sample_size <- function(fit) {
  length(extract(fit, pars = "lp__")[[1]])
}

Then the user could eyeball plots like

hist(Rhat(fit))
hist(n_eff(fit)/sample_size(fit))

to detect convergence problems, or do things like

Rh <- Rhat(fit)
Rh[Rh > 1.1]

for the latter it is very useful that the implementation above provides the names of variables as labels for the vector.

maverickg commented 9 years ago

thanks. It seems that there are even too many functions for now, so I am not sure it is a good idea to add trivial functions, which at least needs doc. It does not hurt much though.

bob-carpenter commented 9 years ago

I think these would make much more sense with a reference object. I think I now see what Ben was talking about w.r.t. doc. That way, you could look at the fit object and see a bunch of utility functions hanging off of it. Or here, even a sub-object with utility functions.

In C++, something that would look like

fit.extra().Rhat();

Where fit.extra() returns an object containing the non-parameter reports per iteration, like Rhat, step size, n_divergent, etc.

The plus and minus here is that it's not clear looking at fit that there's an Rhat object, you have to look at whatever the return type for extra() is.

I think we want to stay away from having to have a list of 100 functions at the very top level, though the advantage there is that you can see everything when you browse.

Bob

On Nov 21, 2014, at 3:54 AM, tpapp notifications@github.com wrote:

It would be great if rstan had a standardized API for extracting convergence and sample size information from stanfit objects. The functionality is there, but user implementations are necessarily a bit ad hoc (see functions below) and use undocumented internals.

The following are toy implementations (I am not an S4 wizard and I know very little about rstan internals, just to demonstrate what I would like to have and how I am doing it currently):

Rhat <- function(fit) { summary(fit)$summary[,"Rhat"] }

n_eff <- function(fit) { summary(fit)$summary[,"n_eff"] }

sample_size <- function(fit) { length(extract(fit, pars = "lp__")[[1]]) }

Then the user could eyeball plots like

hist(Rhat(fit)) hist(n_eff(fit)/sample_size(fit))

to detect convergence problems, or do things like

Rh <- Rhat(fit) Rh[Rh > 1.1]

for the latter it is very useful that the implementation above provides the names of variables as labels for the vector.

— Reply to this email directly or view it on GitHub.

tpapp commented 9 years ago

@bob-carpenter: I am not sure that is the way to go in R, which (AFAIK) has multimethods, not single dispatch like C++, so methods don't belong to objects.

@maverickg: I understand that you want to limit the number of functions. Nevertheless, the information extracted by those functions is sometimes necessary, so IMO it would be better to have a standardized (if trivial) API instead of relying on internals of interim objects.

jrnold commented 9 years ago

With inclusion of ReferenceClasses and new packages like R6 (used by dplyr), having methods tied to objects is becoming more mainstream in R.

But keeping with the more traditional way of writing R code with S3 and S4, the object itself is the API to access the object. While a few accessor-like functions are written, these should be few and far between, as most information is accessed directly from the object using [[, $, and @. Some accessor-like functions exist like logLik, but they have to be relevant to multiple classes to make sense as a method. If they are only relevant to a single class, as I expect that Stan functions will be, they should be named in ways that won't conflict with other functions / methods (e.g. see the issue #118) since by convention everything is put in the main namespace in R. Those sorts of functions become more verbose than simply accessing the object through the extractor operators.

If there is a case to be made for functions, it would be in order to make rstan play well with magrittr's %<% chaining operator, which is gaining popularity.

jgabry commented 7 years ago

I'm closing this since the plan for moving to reference classes soon(ish) is now in place

https://github.com/stan-dev/stan/wiki/User-Interface-Guidelines-for-Developers

and I don't think we'll need to define any S3 or S4 methods like the ones proposed here to make the API consistent once that happens.

stan-dev / rstan

API for extracting convergence information #116