jeffreypullin / rater

R package to fit statistical models to repeated categorical rating data using Stan
https://jeffreypullin.github.io/rater
GNU General Public License v2.0
18 stars 3 forks source link

Rethink point estimate interface #50

Closed jeffreypullin closed 4 years ago

jeffreypullin commented 4 years ago

We need to decide on the optimal interface for extracting point estimates of the three Dawid-Skene parameters.

There are two things to consider here:

What names should be use of the parameters? Principally should we favor mathematical names i.e pi, theta, z or names based on the interpretation of the parameters i.e. prevalence probabilities, error matrices, latent class?

What should the function we use to extract the point estimates be called? Some options:

cc @dvukcevic Thoughts?

dvukcevic commented 4 years ago

Nice summary, @jeffreypullin!

I am leaning towards the third option. Here are my thoughts on each:

jeffreypullin commented 4 years ago

I'm also leaning towards point_estimates. Some more thoughts re the interface:

I (at least currently) think esimtates() alone is too vague.

At this point I think it's worth taking a step back and considering how point_estimates() would fit into the rest of the interface. My current plan is to (eventually) estimate most of the generics from {rstantools} which are listed here. For 0.2 I hope to implement:

A noticeable omission from the generics on that page is a generic to return point estimates as we are discussing. I think this is because {brms} and {rstanarm} both use coef() for this purpose. In an ideal world it would be nice to tie into the posterior_* theme, but I'm not very happy with any of my ideas:

Actually I think point_estimates() might be the best idea after all!

dvukcevic commented 4 years ago

I agree!

One remaining question: Should it be plural (point_estimates()) or singular (point_estimate())?

Other similar functions seem to generally be singular ('interval', 'coef') so I'm inclined to follow suit. You can think of the returned value being a single point estimate of a multi-dimensional quantity rather than a large number of separate estimates, so it makes sense to use singular in that light.

jeffreypullin commented 4 years ago

Good point. point_estimate() it is! I'll implement it later today/tomorrow.

jeffreypullin commented 4 years ago

Or maybe today....

jeffreypullin commented 4 years ago

I think I should stop promising specific deadlines... Anyway, a draft PR is up now.

One question I have is what: point_estimate(fit, pars = "pi") should return. Should it be:

  1. A vector of length K
  2. A list of length 1 holding a vector of length K

when we have something like:

point_estimate(fit, pars = c("pi", "theta") we are forced to return a list because the parameter shapes are complex. Returning the list in the length = 1 case (i.e. 2.) would be better for consistency but requires more work from the user.

dvukcevic commented 4 years ago

I think option 2 (a list of length 1) is best. I can foresee many potential bugs otherwise!

We could potentially offer an argument such as unlist or drop as a convenience feature, but it's probably not worth the effort. A user who cares can simply do unlist(point_estimate(fit, "pi")) on their own.