paul-buerkner / brms

brms R package for Bayesian generalized multivariate non-linear multilevel models using Stan
https://paul-buerkner.github.io/brms/
GNU General Public License v2.0
1.27k stars 183 forks source link

The support of the GEV distribution is not enforced with the currently implemented xi transform #1345

Closed danibarna closed 7 months ago

danibarna commented 2 years ago

Hello,

The transform on the xi parameter as implemented in fun_scale_xi.stan for the generalized extreme value in brms does not enforce the support of the distribution.

We noticed this problem when attempting to fit the brms implementation of the GEV to flood datasets. Most of the time the error is not bad enough to throw off the parameter estimates entirely and the underlying problem is rather subtle. I attempted to explain what we think is happening in this markdown and have a minimal working example (using basic Stan models) here:

working_example_gev.zip

We believe a solution to the support problem exists for a local GEV model via the quantile-based reparameterization described in the markdown. However, enforcing the support in a distributional regression model is more difficult, f. ex. it is not obvious how to express the support requirements in terms of link functions on the parameters in the reparameterized model.

Happy to discuss things further if needed. Such a model would be really useful for statistical modeling of floods.

paul-buerkner commented 2 years ago

Thanks, I will take a look soon.

paul-buerkner commented 2 years ago

Thank you again for analysing the problem in so much details and apologies for being so slow to respond proberly. In terms of a way forward, perhaps it could make sense to define a new gen_extreme_value2 family (either custom family, or later on, a built-in family). How much effort would it be for you to make a custom family out of your proposal? I am happy to discuss this further here on in a call if you like.

One minor question, I have about the currently applied transform that you mentioned: Your markdown says inv_logit(kappa)^gamma + ... while my code reads inv_logit(kappa) * gamma + .... Did I mistake with the latter or was the ^ just a typo?

danibarna commented 2 years ago

Thank you, the ^ is a typo in the markdown--I checked the code in the example models and it is fortunately correct there.

A custom family with a quantile-based parameterization of the GEV was our original goal. We were able to implement this parameterization in a local (i.e. non-covariate dependent) Stan model. The problem is that this is not easily scalable to a brms model. Specifically, we could not find a way to enforce parameter bounds once the parameters depend on covariates. To illustrate the sticking point:

The bounds on $\xi$ under the reparameterization are given as

$$ \frac{W{0}\left( \frac{e^{\beta}\eta \text{log}(\text{log}(2))}{\eta - y{max}} \right)}{\text{log}(\text{log}(2))} < \xi < \frac{W{0}\left( \frac{e^{\beta}\eta \text{log}(\text{log}(2))}{\eta - y{min}} \right)}{\text{log}(\text{log}(2))} $$

where is $\mathbb{W}_0$ is one of the real branches of the Lambert-W function, $\beta$ is a reparameterized scale parameter and the new location parameter is the 0.5 quantile (see markdown in original post for details).

It is difficult to think of a link function that would enforce these bounds. We worked on this with our collaborators at University of Göttingen this spring but eventually dropped it.

Unfortunately I do not see a clear path forward to a custom quantile-based gev family in brms (although I think it is technically possible, just very difficult).

danibarna commented 2 years ago

The problem we identify with the current brms implementation of the GEV is that when the location parameter falls outside the range of observed data, the link function implemented in brms will only propose values that violate the support of the distribution.

The impacts of this are observed model-fitting issues (divergences and problems with initialization) that are the product of a mis-specified model and thus not solvable even with infinite data and very precise priors.

We tested what happens in Stan when the support of an extreme value model is not enforced. Using the support enforcing parameterization of the GEV as reference we can widen the parameter bounds for $\xi$ outside of the support of the distribution (plotted below as a function of parameter $\beta$):

outside_support_lw_model

We can then compare behavior of the support-enforcing model to models where the parameter bounds are artificially wide. Both the +0.25 and the +0.5 model are considerably slower during the warm-up phase than the support-enforcing model and spend time proposing values outside the support. The +0.25 scenario results in 1 divergent transition after warmup. The +0.5 scenario results in 4 divergent transitions after warmup. This is still a small portion of the total chains (each of the 12 simulated datasets was run for 3 chains) and every simulated dataset managed to fit at least 2 out of 3 chains. In every case where things converged, the parameter estimates from the non-support enforcing models match that of the support-enforcing model:

posteriors

So while we might have problems fitting the model, it seems some sloppiness in the bounds doesn’t throw the sampler off entirely. It is problematic, however, to have fitting issues that arise from a mis-specified model and not a more solvable problem (such as needing better priors or a non-centered parameterization, for example). This is the behavior we were seeing with the current brms link function on $\xi$.

To enforce the support of the GEV when fitting with HMC we would need a way to define the relationship between the location parameter and the min and max of the data set. Because of this we suspect a quantile-based parameterization is the only way to enforce the support of the GEV distribution in Stan. This, however, comes with all the problems detailed in the post above and in the markdown linked in the original post.

Hopefully this is informative and happy to discuss further.

paul-buerkner commented 2 years ago

Thank you for your detailed response and explanation! I agree with your assessment in that I also currently don't see an easy way out of this. But it is good to keep this issue in mind in the hope that we will eventually come up with a principled and scalable solution.

danibarna commented 2 years ago

Agreed! Hopefully this is something that is useful in the future.

Not sure what else to do right now except note that in practice debugging the current gev implementation in brms can be very, very difficult. Especially once we start working with real-world datasets of extreme events, where the sampler doesn’t have a ton of data to pull it into nice areas of the parameter space.

There’s also some question about the validity of this whole implementation...when the link function doesn’t enforce the parameter bounds, we’re essentially defining an area of the parameter space to have zero density. I don’t know how this affects HMC.

Based on the little simulation study we ran, it seems it's still possible for the sampler to converge to the correct estimates but (a) this wasn’t tested with covariates on the parameters and (b) this one test with simulated data isn’t exactly a solid mathematical answer.

The question of what HMC does when part of the parameter space has zero density could be an interesting one to throw to the Stan discourse page.

danibarna commented 2 years ago

For references sake, here's the publication on quantile based GEV regression models for extremes that made us start looking into a quantile-based brms model.

paul-buerkner commented 7 months ago

I will close this issue for now to clean up the issue tracker a bit since I don't this this has high priority. I am also thinking of deprecating the GEV distribution in brms since it requires awkward special case code that I would prefer not to have.