pymc-labs / pymc-marketing

Bayesian marketing toolbox in PyMC. Media Mix (MMM), customer lifetime value (CLV), buy-till-you-die (BTYD) models and more.
https://www.pymc-marketing.io/
Apache License 2.0
718 stars 203 forks source link

Not Possible to Have Channel Contributions for Out-of-Sample / Non-Fit Data? #981

Open louismagowan opened 3 months ago

louismagowan commented 3 months ago

Hi there,

I've attached some code and screenshots below which should hopefully illustrate my point. Essentially, I'm building my MMM with some training data and then testing it's predictions with some test data. However, I would like to get channel contributions not just for my training data (which is in the idata), but also for my test data and the full dataset (train + test).

I would appreciate any advice or tips you have πŸ™

TL;DR

# Load mmm, train and test data
mmm = MMM.load("models/model.nc")
train = pd.read_csv("train.csv", parse_dates=[c.DATE_COLUMN])
test = pd.read_csv("test.csv", parse_dates=[c.DATE_COLUMN])

# Recreate object structure used during training
y_train = train[c.TARGET_COLUMN]
X_train = train.drop(columns=[c.TARGET_COLUMN])
y_test = test[c.TARGET_COLUMN]
X_test = test.drop(columns=[c.TARGET_COLUMN])
input_df = pd.concat([train, test])
y = input_df[c.TARGET_COLUMN]
X = input_df.drop(columns=[c.TARGET_COLUMN])

# MMM was fit using train data <- I want channel contributions over the full dataset, test and train
contribs = mmm.compute_mean_contributions_over_time()
print(contribs.shape[0])

mmm.sample_posterior_predictive(X,
                    progressbar=False,
                    extend_idata=True,
                    random_seed=rng)

mmm.idata.posterior.channel_contributions.date.shape[0]

mmm.idata.posterior_predictive.date.shape[0]

# MMM was fit using train data <- I want channel contributions over the full dataset, test and train
contribs = mmm.compute_mean_contributions_over_time()
print(contribs.shape[0])

image

AlfredoJF commented 3 months ago

Hey @louismagowan,

You could use this alternative to get those channel contributions over time if you pass more model variables to var_names.

I'll let the dev team answer why it's limited to self.fit_result and extend_idata=True is not used in any of the examples for out-of-sample.

y_test_pred = mmm.sample_posterior_predictive(
    X_pred=X_test,
    extend_idata=False,
    include_last_observations=True,
    var_names=["y", "channel_contributions"],
    original_scale=False
)

mean_contributions_by_channel = (
        y_test_pred['channel_contributions']
        .mean(dim='sample')
        .values 
        * mmm.get_target_transformer()["scaler"].scale_ # rescale contribution
)

Note that I'm using 0.7.0 and set the original_scale=Falsebecause I got this error when original_scale=True: ValueError: Found array with dim 3. None expected <= 2. So you would need to rescale the predictions

louismagowan commented 3 months ago

Thanks @AlfredoJF ! I'm gonna test out your solution now 😁

I encountered the same error when using original_scale=True, so thanks for the explanation and tips!