Closed jscamac closed 1 year ago
Reposted this on Stan discourse as I'm not sure if this is a brms bug, an issue with the model, or potentially an issue with dependency packages such as loo, rstantools.
https://discourse.mc-stan.org/t/grouped-kfold-return-nan/31843/3
This has been resolved. Mostly associated with weak priors on non-linear parameters that needed to be positive. Solution was to use the submodel to estimate the non-linear parameters on the log scale and exponentiate in the main model
Hi I've successfully fitted two brms models, where I'm modelling canopy area as a function of different non-linear functions (an example is shown below). Each model has converged, with few (if any) divergent iterations, and each parameter has good effective sample sizes.
However, when I try to do a group split kfold validation I'm getting NANs (see below) and I'm not sure why.
An example of the model I'm fitting looks like this
As stated above, the model seems to fit with minimal issues (apart from it taking an age to finish sampling; there is over 40,000 rows of data). Parameter estimates look reasonable. What I want to do is examine the predictive capacity between different model variants by using a split group kfold approach, where the interest is assessing how well the model predicts to withheld random effect groups. To do that I specify this using the following:
When I run the above, I get the following outcome with no error or warning messages:
What I think might be the issue is that there is considerable variability in the number of observations between each fold.
e.g.
I've looked at each of these folds and there appears to be reasonable variability in the other data input parameters (e.g. growth_years & street_tree). I've even managed to individually fit each of these fold subsets without running into convergence issues. I'm assuming that under the hood there is a problem with the level of unevenness among folds. Though when I've tried to replicate this unevenness in folds in mock datasets I'm not running into this issue. Unfortunately I can't share the data. Any tips or advice?
Oh and incase your interested this is the part of the pointwise samples