Not Possible to Have Channel Contributions for Out-of-Sample / Non-Fit Data?

louismagowan commented 3 months ago

Hi there,

I've attached some code and screenshots below which should hopefully illustrate my point. Essentially, I'm building my MMM with some training data and then testing it's predictions with some test data. However, I would like to get channel contributions not just for my training data (which is in the idata), but also for my test data and the full dataset (train + test).

I would appreciate any advice or tips you have 🙏

TL;DR

I can't seem to get channel_contributions for data other than what was used to fit my MMM with
predict, predict_posterior and sample_posterior_predictive can extend idata for predictions of my target
- But don't seem to allow extending idata for channel contributions

# Load mmm, train and test data
mmm = MMM.load("models/model.nc")
train = pd.read_csv("train.csv", parse_dates=[c.DATE_COLUMN])
test = pd.read_csv("test.csv", parse_dates=[c.DATE_COLUMN])

# Recreate object structure used during training
y_train = train[c.TARGET_COLUMN]
X_train = train.drop(columns=[c.TARGET_COLUMN])
y_test = test[c.TARGET_COLUMN]
X_test = test.drop(columns=[c.TARGET_COLUMN])
input_df = pd.concat([train, test])
y = input_df[c.TARGET_COLUMN]
X = input_df.drop(columns=[c.TARGET_COLUMN])

# MMM was fit using train data <- I want channel contributions over the full dataset, test and train
contribs = mmm.compute_mean_contributions_over_time()
print(contribs.shape[0])

mmm.sample_posterior_predictive(X,
                    progressbar=False,
                    extend_idata=True,
                    random_seed=rng)

mmm.idata.posterior.channel_contributions.date.shape[0]

mmm.idata.posterior_predictive.date.shape[0]

# MMM was fit using train data <- I want channel contributions over the full dataset, test and train
contribs = mmm.compute_mean_contributions_over_time()
print(contribs.shape[0])

AlfredoJF commented 3 months ago

Hey @louismagowan,

You could use this alternative to get those channel contributions over time if you pass more model variables to var_names.

I'll let the dev team answer why it's limited to self.fit_result and extend_idata=True is not used in any of the examples for out-of-sample.

y_test_pred = mmm.sample_posterior_predictive(
    X_pred=X_test,
    extend_idata=False,
    include_last_observations=True,
    var_names=["y", "channel_contributions"],
    original_scale=False
)

mean_contributions_by_channel = (
        y_test_pred['channel_contributions']
        .mean(dim='sample')
        .values 
        * mmm.get_target_transformer()["scaler"].scale_ # rescale contribution
)

Note that I'm using 0.7.0 and set the original_scale=Falsebecause I got this error when original_scale=True: ValueError: Found array with dim 3. None expected <= 2. So you would need to rescale the predictions

louismagowan commented 3 months ago

Thanks @AlfredoJF ! I'm gonna test out your solution now 😁

I encountered the same error when using original_scale=True, so thanks for the explanation and tips!

pymc-labs / pymc-marketing

Not Possible to Have Channel Contributions for Out-of-Sample / Non-Fit Data? #981