mjskay / tidybayes

Bayesian analysis + tidy data + geoms (R package)
http://mjskay.github.io/tidybayes
GNU General Public License v3.0
718 stars 59 forks source link

fitted_draws.brmsfit slow on spatial model because argument "n" is not passed on? #213

Closed Raoul-Kima closed 3 years ago

Raoul-Kima commented 4 years ago

Today I was working with a brms model containing a gaussian proccess term. predicted_draws (with default settings) didn't return results in the time I was patient to wait, so I set n=10, and then it worked. Playing with "n" I found out that for this model, prediction is very slow. I then tried fitted_draws, and had the same problem there, but unlike with predicted_draws specifying "n=10" didn't solve the problem. Looking through the source code via the debugger I found that the line causing the problem is where fitted_predicted_draws_brmsfit is called, which in turn calls brms::fitted.brmsfit.

Indeed, calling brms::fitted.brmsfit by hand with the same arguments from the global environment gave me the same problem. It was because "n" was not passed on. In the global environment, I could add "nsamples=10" to the call of brms::fitted.brmsfit, which solved the problem.

Now, two things:

  1. I wasn't able to test whether changing fitted_draws.brmsfit accordingly actually solved the problem, because my R-Wizardry didn't suffice to be able to execute the "patched" function in the correct environment.
  2. There might be good reasons why "n" isn't passed on, which I don't know about. So, while at first sight this appears like an oversight, I suspect it could actually be intentional. I didn't try to read all the sourcecode to see if there is any reason not to pass "n" on.

Edit: I'm aware that a sample size of 10 is very little That's not my point here. I only used that number for testing purposes.

mjskay commented 4 years ago

Ah yeah, that is a limitation because of how I structured fitted_draws internally. The main issue (as I recall) was making a guarantee that the same n draws are used in situations where I need to call fitted.brmsfit multiple times to construct dataframes that join together (say) multiple distributional parameters. So it was easier from an implementation perspective to do the filtering to a subset of draws after the join, which means pulling all the draws out. Another option might be for me to choose a seed and re-use the same seed on successive calls to (hopefully) get values from the same draws, which is a solution worth investigating.

A workaround you might use in the meantime is to call fitted.brmsfit directly and then use the result with add_draws to construct the data frame you need. See here: http://mjskay.github.io/tidybayes/reference/add_draws.html

mjskay commented 3 years ago

In the next version of tidybayes n (now called ndraws) will be passed through in [add_]epred_draws() (the successors to [add_]fitted_draws()).