mjskay / tidybayes

Bayesian analysis + tidy data + geoms (R package)
http://mjskay.github.io/tidybayes
GNU General Public License v3.0
722 stars 59 forks source link

Invalid argument 'dpar' while extracting fitted from nonlinear brms model #99

Closed jsta closed 6 years ago

jsta commented 6 years ago

The following data is from vignette("brms_nonlinear"). My guess is that an error is thrown because the model is nonlinear but I am not completely sure.

library(brms)
library(tidybayes)
library(modelr)
library(magrittr)

b <- c(2, 0.75)
x <- rnorm(100)
y <- rnorm(100, mean = b[1] * exp(b[2] * x))
dat1 <- data.frame(x, y)

prior1 <- c(prior(normal(1, 2), nlpar = "b1"),
            prior(normal(0, 2), nlpar = "b2"))
fit1 <- brm(bf(y ~ b1 * exp(b2 * x), b1 + b2 ~ 1, nl = TRUE),
            data = dat1, prior = prior1)

data_grid(dat1, x = seq_range(x, 51)) %>% 
  add_fitted_samples(fit1)

Error: Invalid argument 'dpar'. Valid distributional parameters are: 'mu', 'sigma'
In addition: Warning message:
In names(dpars)[missing_names] = dpars[missing_names] :

 Show Traceback

 Rerun with Debug
 Error: Invalid argument 'dpar'. Valid distributional parameters are: 'mu', 'sigma' 
jsta commented 6 years ago

It works without error if the "dpar" value is instead passed as "nlpar". See https://github.com/mjskay/tidybayes/blob/master/R/add_predicted_samples.R#L221

tmastny commented 6 years ago

I think I see what is going on, but Matt will have a better idea. brms allows the user to set a distribution parameter with the dpar argument, either mu or sigma. tidybayes uses names(model$formula$pforms) to check if those parameters were included and passes them along to brms.

However, in your model we get the following:

> names(fit1$formula$pforms)
[1] "b1" "b2"

tidybayes tries to pass those to the brms parameter dpar where it understandably fails.

mjskay commented 6 years ago

Yup, that sounds like the issue. Looking into this more, I think a more correct way to get a list of dpars and nlpars from a brmsfit is going to be to use brms::parse_bf.

Another question is if the auxpars name was premature abstraction on my part, and just keeping it as dpars and nlpars in fitted_samples makes the most sense. Unless I've missed something (very possible) the only other package I am aware of that this type of thing applies to at the moment is rethinking (via tidybayes.rethinking, which anyway needs to be updated). So the ideal fix might be to come up with a way to distinguish dpars and nlpars in brms and then split auxpars into dpars and nlpars in fitted_samples.

tmastny commented 6 years ago

Are there plans to merge tidybayes.rethinking into the base package?

tmastny commented 6 years ago

@mjskay Based on the brms source code that throws the error extract_draws is one option to get a list of valid dpars for a brmsfit object.

For example, using the same fit1 from the original post:

> names(brms:::extract_draws(fit1)$dpars)
[1] "mu"    "sigma"

corresponds to the valid distributional parameters listed in the error message. Likewise, here is another example:

library(brms)
set.seed(1234)
dat <- data.frame(
  count = rpois(236, lambda = 20),
  visit = rep(1:4, each = 59),
  patient = factor(rep(1:59, 4)),
  Age = rnorm(236), 
  Trt = factor(sample(0:1, 236, TRUE)),
  AgeSD = abs(rnorm(236, 1)),
  Exp = sample(1:5, 236, TRUE),
  volume = rnorm(236) 
)

fit2 <- brm(
  bf(count ~ Age + (1|visit), mu2 ~ Age), dat,
  family = mixture(gaussian, exponential),
  prior = c(prior(normal(0, 10), Intercept, dpar = mu1),
            prior(normal(0, 1), Intercept, dpar = mu2),
            prior(normal(0, 1), dpar = mu2)),
  warmup = 150, iter = 200, chains = 2
)
fit2$formula$pforms
fitted(fit2, dpar = 'm2')

and the error:

 Error: Invalid argument 'dpar'. Valid distributional parameters are: 'mu1', 'sigma1', 'mu2', 'theta' 

and the corresponding extract_draws lookup:

> names(brms:::extract_draws(fit2)$dpars)
[1] "mu1"    "sigma1" "mu2"    "theta" 

The problem is that extract_draws is a relatively expensive call for simply identifying the names of the parameters. The problem, as seen in the source code, is that parsing out the dpars appears to be tightly intertwined in the extract_draws function used for the fitted method.

Emphasis on relatively expensive:

> start_time <- Sys.time()
> names(brms:::extract_draws(fit2)$dpars)
[1] "mu1"    "sigma1" "mu2"    "theta" 
> end_time <- Sys.time()
> end_time - start_time
Time difference of 0.117537 secs
mjskay commented 6 years ago

I think this might sort of work (and uses an exported function from brms):

> names(brms::parse_bf(fit1$formula)$dpars)
[1] "mu"

And:

> names(brms::parse_bf(fit2$formula)$dpars)
[1] "mu2" "mu1"

Because these work:

fitted(fit2, dpar = "mu1")
fitted(fit2, dpar = "mu2")

But these (which are in the results of the extract_draws call) do not:

> fitted(fit2, dpar = "sigma1")
Error: Distributional parameter 'sigma1' was not predicted.
> fitted(fit2, dpar = "theta")
Error: Distributional parameter 'theta' was not predicted.

Only problem is that this does not work:

> fitted(fit1, dpar = "mu")
Error: Distributional parameter 'mu' was not predicted.

But I'm pretty sure we can just treat mu as implying dpar = NULL, because this does work:

fitted(fit1, dpar = NULL)

When I have time (possibly Thursday) I am going to try to write this up into a coherent issue to open on the brms repo. I think that should help us clear things up.

tmastny commented 6 years ago

Good find! The discrepancy between the "valid distributional parameters" given by the error message and what values work with the fitted method is confusing.

mjskay commented 6 years ago

I added a short term (hackish) fix for this based on brms::parse_bf, and opened an issue on brms (paul-buerkner/brms#318) so that we can come up with a long term solution to do this correctly.

I'll leave this issue open until there is a long term fix. In the mean time, let me know if you come across anything the short term fix does not work for.

mjskay commented 6 years ago

Looks like brms::parse_bf is the canonical approach (but without the current hack of removing mu, which is no longer necessary in recent versions of brms).

However, probably extracting these automatically by default (having default auxpars = TRUE) is the wrong approach. Instead, I think it makes sense to have the default be not to include the auxpars (auxpars = FALSE), and allow the user to either attempt to extract them all automatically (by setting auxpars = TRUE) or to specify them manually (by passing names or a named list to auxpars).

paul-buerkner commented 6 years ago

I think that sounds reasonable. Just one more thought. I believe the name auxpar orginated from an earlier version of brms and has later been renamed to dpars by me. The question is whether it might make sense to also rename it in tidybayes?

mjskay commented 6 years ago

Good question. That probably makes sense --- the only other package I am aware of that might need something like this currently is rethinking, whose link function can provide multiple link-level predictions depending on the model. However, the link function in that package doesn't give a name to these, so I see no reason not to defer to brms to choose terminology :).

So TODO:

mjskay commented 6 years ago

Sidebar: I'm changing this to dpar (no s) instead of dpars for consistency with brms.