pymc-devs / pymc-examples

Examples of PyMC models, including a library of Jupyter notebooks.
https://www.pymc.io/projects/examples/en/latest/
MIT License
259 stars 211 forks source link

Divergences in examples/generalized_linear_models/GLM-ordinal-regression.ipynb #649

Open erik-werner opened 3 months ago

erik-werner commented 3 months ago

Regression Models with Ordered Categorical Outcomes: https://www.pymc.io/projects/examples/en/latest/generalized_linear_models/GLM-ordinal-regression.html:

Issue description

After make_model() is defined, five different models are created and sampled from. Unfortunately, models 1-2 have hundreds of divergences. Models 3-4 sample fine. Model 5 is identical to model 4, presumably the argument logit=False is missing. If I add this argument, 2 of 4 chains are completely stuck when sampling.

These divergences were not automatically reported because the numpyro sampler was used. However, since the most recent release we get warnings for divergences also for this sampler. 👍

Proposed solution

Fix the models so that they can be sampled from. Or if that turns out to be difficult, consider whether they can be removed. Or add a discussion of the problem with these models.

erik-werner commented 3 months ago

@NathanielF would you like to have a look at this? Or do you want me to see if I can fix it?

NathanielF commented 3 months ago

Feel free to have a look @erik-werner ! Thanks for bringing it up

NathanielF commented 3 months ago

Just a note here @erik-werner

Had a quick look. I think the code you want to run is :


def make_model(priors, model_spec=1, constrained_uniform=False, logit=True):
    with pm.Model() as model:
        if constrained_uniform:
            cutpoints = constrainedUniform(K, 0, K)
        else:
            sigma = pm.Exponential("sigma", priors["sigma"])
            cutpoints = pm.Normal(
                "cutpoints",
                mu=priors["mu"],
                sigma=sigma,
                transform=pm.distributions.transforms.univariate_ordered,
            )

        if model_spec == 1:
            beta = pm.Normal("beta", priors["beta"][0], priors["beta"][1], size=1)
            mu = pm.Deterministic("mu", beta[0] * df.salary)
        elif model_spec == 2:
            beta = pm.Normal("beta", priors["beta"][0], priors["beta"][1], size=2)
            mu = pm.Deterministic("mu", beta[0] * df.salary + beta[1] * df.work_sat)
        elif model_spec == 3:
            beta = pm.Normal("beta", priors["beta"][0], priors["beta"][1], size=3)
            mu = pm.Deterministic(
                "mu", beta[0] * df.salary + beta[1] * df.work_sat + beta[2] * df.work_from_home
            )
        if logit:
            y_ = pm.OrderedLogistic("y", cutpoints=cutpoints, eta=mu, observed=df.explicit_rating)
        else:
            y_ = pm.OrderedProbit("y", cutpoints=cutpoints, eta=mu, observed=df.explicit_rating)
        if not logit:
            idata = pm.sample(idata_kwargs={"log_likelihood": True})
        else:
            idata = pm.sample(nuts_sampler="numpyro", idata_kwargs={"log_likelihood": True})
        idata.extend(pm.sample_posterior_predictive(idata))
    return idata, model

priors = {"sigma": 1, "beta": [0, 1], "mu": np.linspace(0, K, K - 1)}
idata1, model1 = make_model(priors, model_spec=1)
idata2, model2 = make_model(priors, model_spec=2)
idata3, model3 = make_model(priors, model_spec=3)
idata4, model4 = make_model(priors, model_spec=3, constrained_uniform=True)
idata5, model5 = make_model(priors, model_spec=3, constrained_uniform=True, logit=False)

Models 1 and 2 are kind of underspecified so you will get divergences. The interesting contrasts are between the models 3 and 4 where we use the constrained uniform on the cutpoints. If you fix the fifth model to use the default sampler rather than numpyro you should be able to run the probit fit.

This is also arguably an interesting contrast just to see how the parameter values shift... I think i wouldn't remove models 1 and 2 but i'd add text discussing how a better specified model avoids the issues with divergences that you've highlighted

image
erik-werner commented 3 months ago

@NathanielF These were basically my conclusions as well :slightly_smiling_face:
One more thing: It seems like the divergences for models 1&2 occur when sigma is very close to 0, so in principle one could probably get rid of most divergences with a non-centered parametrisation. But that doesn't combine well with the ordered transform.

I wonder why the numpyro sampler cannot handle the probit model.

Anyway, since you have a better overview of the whole flow, and seem to have an idea about what to write, it would be great if you'd like to update the notebook.

NathanielF commented 3 months ago

Sure thing @erik-werner. Will do.

On numpyro i think it has to do with the lack of parity between the jax implementation of ercf function in scipy.

Thanks again for bringing the issue to my attention.