pymc-devs / pymc

Bayesian Modeling and Probabilistic Programming in Python
https://docs.pymc.io/
Other
8.48k stars 1.97k forks source link

Unexpected behavior with `pm.sample(var_names=...)` #7258

Closed tomicapretto closed 2 months ago

tomicapretto commented 2 months ago

Description

PyMC 5.13 incorporates the var_names parameter in pm.sample(). The documentation says var_names: Names of variables to be stored in the trace. Defaults to all free variables and deterministics.

This comes very handy for something I've been trying to do in Bambi. Now I'm porting Bambi to use this feature and noticed weird results with tests. I reproduced one of the models with PyMC and noticed the problem. Have a look at this

import arviz as az
import numpy as np
import pymc as pm

batch = np.array(
    [
        1,  1,  1,  1,  2,  2,  2,  3,  3,  3,  4,  4,  4,  4,  5,  5,  5,
        6,  6,  6,  7,  7,  7,  7,  8,  8,  8,  9,  9, 10, 10, 10
    ]
)
temp = np.array(
    [
        205, 275, 345, 407, 218, 273, 347, 212, 272, 340, 235, 300, 365,
        410, 307, 367, 395, 267, 360, 402, 235, 275, 358, 416, 285, 365,
        444, 351, 424, 365, 379, 428
    ]
)

y = np.array(
    [
        0.122, 0.223, 0.347, 0.457, 0.08 , 0.131, 0.266, 0.074, 0.182,
        0.304, 0.069, 0.152, 0.26 , 0.336, 0.144, 0.268, 0.349, 0.1  ,
        0.248, 0.317, 0.028, 0.064, 0.161, 0.278, 0.05 , 0.176, 0.321,
        0.14 , 0.232, 0.085, 0.147, 0.18
    ]
)

batch_values, batch_idx  = np.unique(batch, return_inverse=True)

coords = {
    "batch": batch_values
}

with pm.Model(coords=coords) as model:
    b_batch = pm.Normal("b_batch", dims="batch")
    b_temp = pm.Normal("b_temp")
    mu = pm.Deterministic("mu", pm.math.invlogit(b_batch[batch_idx] + b_temp * temp))
    kappa = pm.Gamma("kappa", alpha=2, beta=2)

    alpha = mu * kappa
    beta = (1 - mu) * kappa

    pm.Beta("y", alpha=alpha, beta=beta, observed=y)

I want to sample the posterior, but I don't want to store the draws of "mu" by default. So I use var_names=["b_batch", "b_temp", "kappa"] (and I also sample without var_names to see the difference).

with model:
    idata_1 = pm.sample(random_seed=1234)
    idata_2 = pm.sample(var_names=["b_batch", "b_temp", "kappa"], random_seed=1234)

When I don't use var_names I get the following posterior

az.plot_trace(idata_1, backend_kwargs={"layout": "constrained"});

image

and when I use var_names it's the following

az.plot_trace(idata_2, backend_kwargs={"layout": "constrained"});

image

which makes me think it's basically omitting the likelihood and thus sampling from the prior.

Is this behavior expected?

ricardoV94 commented 2 months ago

Is this behavior expected?

Nope. I guess we didn't incorporate it well. CC @fonnesbeck

tomicapretto commented 2 months ago

@ricardoV94 @fonnesbeck I had a look at the code but I'm not very familiar with the trace and sampler internals. Do you have any pointer on what needs to be modified? I could give it a shot

tomicapretto commented 2 months ago

I've been trying to better understand what's happening here and I reached the initiliaziation method of the BaseTrace class

https://github.com/pymc-devs/pymc/blob/60a631467ca8b166df7e852156c5fa714e912094/pymc/backends/base.py#L145-L158

So I imagined something was happening with the compiled function. I manually re-created the compiled function with var_names=None and with var_names=["b_batch", "b_temp", "kappa"].

var_names = ["b_batch", "b_temp", "kappa"]
vars_trace = [v for v in model.unobserved_RVs if v.name in var_names]
point_func = model.compile_fn(vars_trace, inputs=model.value_vars, on_unused_input="ignore") 
point_func({"b_batch": np.array([0] * 10), "b_temp": 0, "kappa_log__":0})
[array([-2.04367572,  0.20578688,  0.44895   , -0.25317641,  1.70734415,
        -0.25723554, -0.58851374, -1.53500779,  0.73048534,  0.90274171]),
 array(-0.04592494),
 array(0.68558234)]

If I don't use on_unused_input="ignore" it raises

UnusedInputError: pytensor.function was asked to create a function computing outputs given certain inputs, but the provided input variable at index 0 is not part of the computational graph needed to compute the outputs: b_batch.

which I think it indicates the problem. Also, notice every time you ran the previous chunk it'll return different values.

On the other hand, if we reproduce what var_names=None does we get

vars_trace = model.unobserved_value_vars
var_names = [var.name for var in vars_trace] # ['b_batch', 'b_temp', 'kappa_log__', 'kappa', 'mu']
point_func = model.compile_fn(vars_trace, inputs=model.value_vars)
point_func({"b_batch": np.array([0] * 10), "b_temp": 0, "kappa_log__":0})
[array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]),
 array(0.),
 array(0.),
 array(1.),
 array([0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,
        0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,
        0.5, 0.5, 0.5, 0.5, 0.5, 0.5])]

To summarize, I'm thinking not including variables in the trace is a bit more complicated than expected? I'm not sure what is the next step is. To me it looks like we need to pass all the necessary outputs to compile_fn so it works properly, but then somehow store only part of the results, the ones that match whatever we pass to var_names. But I don't know, this is new territory for me.

ricardoV94 commented 2 months ago

My hunch is that with var_names it's trying to create a function of the RVs (which explains why it's random and why it does not depend on the model.value_vars), but we want one of the respective value variables. We basically want to use var_names to filter out a subset of the value variables (which at some point must be collected when you set None)?

ricardoV94 commented 2 months ago

If you check what model.unobserved_value_vars does, you'll see it grabs all RVs and deterministics, and converts them to the "respective" value variables, using something like model.replace_rvs_by_values or something similar sounding.

We want to do the same when var_names is given

ricardoV94 commented 2 months ago

Reopening as a robust test is still missing

tomicapretto commented 2 months ago

Would a comparison of draws between two posteriors one with and without var_names be enough? Or do you have something more elaborate in mind?

ricardoV94 commented 2 months ago

Would a comparison of draws between two posteriors one with and without var_names be enough? Or do you have something more elaborate in mind?

That's what I had in mind.