pymc-devs / pymc

Bayesian Modeling and Probabilistic Programming in Python
https://docs.pymc.io/
Other
8.67k stars 2k forks source link

Problem when trying to get summary statistics, trace plot and posterior plot #7157

Closed philipperobertt closed 7 months ago

philipperobertt commented 7 months ago

Describe the issue:

Hello, I am building a Bayesian Regression model (see the code bellow) using the PyMC library. When trying to get summary statistics, trace plot and posterior plot I get an error.

I am using Google Colab. Here is the version I am using of PyMC and arviz: Name: pymc Version: 5.7.2 Name: arviz Version: 0.15.1

Reproduceable code example:

prior_belief_df = bb_belief_xs.groupby('company_ciqid')['5Y_performance'].mean().reset_index()

intervals = np.arange(0.00, 0.16, 0.01)

#prepare lists to collect aggregated data
X_aggregated = []
Y_aggregated = []

for lower_bound in intervals:
    #filter rows where marketcap_percentage is greater than or equal to the current lower bound
    filtered_df = bb_belief_xs[bb_belief_xs['marketcap_percentage'] >= lower_bound]

    #group by company_ciqid and compute the mean of 5Y_performance
    grouped_means = filtered_df.groupby('company_ciqid')['5Y_performance'].mean().reset_index()

    #for each interval, the X value is the lower bound (representing marketcap_percentage)
    #the Y values are the grouped means of 5Y_performance
    X_aggregated.extend([lower_bound] * len(grouped_means))
    Y_aggregated.extend(grouped_means['5Y_performance'].values)

#convert aggregated lists to numpy arrays
X = np.array(X_aggregated)
Y = np.array(Y_aggregated)

#compute global average and standard deviation of 5Y_performance for prior
global_mean_performance = prior_belief_df['5Y_performance'].mean()
global_std_performance = prior_belief_df['5Y_performance'].std()

with pm.Model() as model:
    #priors for unknown model parameters
    alpha = pm.Normal('alpha', mu=global_mean_performance, sigma=global_std_performance)
    beta = pm.Normal('beta', mu=0, sigma=10)
    sigma = pm.HalfNormal('sigma', sigma=10)

    #expected value of outcome
    mu = alpha + beta * X

    #likelihood (sampling distribution) of observations
    Y_obs = pm.Normal('Y_obs', mu=mu, sigma=sigma, observed=Y)

    #model fitting
    trace = pm.sample(100, return_inferencedata=False)

Error message:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-26-33cf81953267> in <cell line: 1>()
----> 1 summary = az.summary(trace)
      2 print(summary)

2 frames
/usr/local/lib/python3.10/dist-packages/arviz/data/converters.py in convert_to_inference_data(obj, group, coords, dims, **kwargs)
    128             "cmdstanpy fit",
    129         )
--> 130         raise ValueError(
    131             f'Can only convert {", ".join(allowable_types)} to InferenceData, '
    132             f"not {obj.__class__.__name__}"

ValueError: Can only convert xarray dataarray, xarray dataset, dict, netcdf filename, numpy array, pystan fit, emcee fit, pyro mcmc fit, numpyro mcmc fit, cmdstan fit csv filename, cmdstanpy fit to InferenceData, not MultiTrace

PyMC version information:

I am using Google Colab. Here is the version I am using of PyMC and arviz: Name: pymc Version: 5.7.2 Name: arviz Version: 0.15.1

Context for the issue:

No response

welcome[bot] commented 7 months ago

Welcome Banner] :tada: Welcome to PyMC! :tada: We're really excited to have your input into the project! :sparkling_heart:
If you haven't done so already, please make sure you check out our Contributing Guidelines and Code of Conduct.