pymc-devs / pymc-experimental

https://pymc-experimental.readthedocs.io
Other
72 stars 46 forks source link

Pathfinder gives confident wrong answer with small sample prediction #279

Open fonnesbeck opened 6 months ago

fonnesbeck commented 6 months ago

This example is taken from the baseball case study in pymc-examples. We fit a beta-binomial model to some baseball batting data:

data = pd.read_csv(pm.get_data("efron-morris-75-data.tsv"), sep="\t")

N = len(data)
player_names = data["FirstName"] + " " + data["LastName"]
# coords = {"player_names": player_names.tolist()}

with pm.Model() as baseball_model:
    at_bats = pm.MutableData("at_bats", data["At-Bats"].to_numpy())
    n_hits = pm.MutableData("n_hits", data["Hits"].to_numpy())
    baseball_model.add_coord("player_names", player_names, mutable=True)

    phi = pm.Uniform("phi", lower=0.0, upper=1.0)

    kappa_log = pm.Exponential("kappa_log", lam=1.5)
    kappa = pm.Deterministic("kappa", pm.math.exp(kappa_log))

    theta = pm.Beta("theta", alpha=phi * kappa, beta=(1.0 - phi) * kappa, dims="player_names")
    y = pm.Binomial("y", n=at_bats, p=theta, observed=n_hits, dims="player_names")

and then add a prediction for a fictional player that has zero hits in 4 appearances:

with baseball_model:
    theta_new = pm.Beta("theta_new", alpha=phi * kappa, beta=(1.0 - phi) * kappa)
    y_new = pm.Binomial("y_new", n=4, p=theta_new, observed=0)

What should occur (and does with either pymc.sample or pymc.fit) is that since the sample size of y_new is so small, it should be shrunk towards the population mean. Here is the population of players:

410871ea-80e5-44e6-a402-1c8acbd26ba6

and the population mean is given by phi:

112991f7-c433-4480-8026-e95aa154f926

however, the estimate for theta_new is way too large (larger than the most extreme player in the fitting dataset) with a high degree of posterior confidence:

adaf7323-a7f9-4a56-9a6b-1ae270ac6f0c

Running the same model with pm.fit or pm.sample returns more reasonable estimates just under the population mean.

Using PyMC 3.10.1 and pymc-experimental from the main repo.

junpenglao commented 6 months ago

Not sure, the pathfinder return result that underestimated kappa and theta: image

But probably this is the property of pathfinder, I dont work with it enough to provide good perspective. @ColCarroll has a bit more experience, maybe he has some idea?

ricardoV94 commented 6 months ago

See also the issues I found before with the 8 school example, where it would basically return the initval for whatever mu was: https://gist.github.com/ricardoV94/eafd20ac47d63525253b0a8adf5e5d76

junpenglao commented 6 months ago

yeah the pathfinder have a jaxopt dependency that have some convergent gap (compare to scipy.optimize.minimize). I think on the blackjax side we can be more explicit on detecting none convergence.

junpenglao commented 6 months ago

For the intermediate, I suggest adding some noise to the initial position: https://github.com/pymc-devs/pymc-experimental/blob/00d7a2b3cf3379e0a9420fb436667ab781e5a5e7/pymc_experimental/inference/pathfinder.py#L104, so at the very least we can run the pathfinder a couple of times.

ricardoV94 commented 6 months ago

You can use this to add jitter to RVs: https://github.com/pymc-devs/pymc/blob/0fd7b9e1d2208f1250b1c804bf5421013dba9023/pymc/initial_point.py#L111