pymc-labs / pymc-marketing

Bayesian marketing toolbox in PyMC. Media Mix (MMM), customer lifetime value (CLV), buy-till-you-die (BTYD) models and more.
https://www.pymc-marketing.io/
Apache License 2.0
705 stars 198 forks source link

MultiDimensional Media Mix Model (New PR) #1036

Open cetagostini opened 2 months ago

cetagostini commented 2 months ago

Description

Creating an API to support multiple dims.

Related Issue

Checklist

Modules affected

Type of change


📚 Documentation preview 📚: https://pymc-marketing--1036.org.readthedocs.build/en/1036/

review-notebook-app[bot] commented 2 months ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

codecov[bot] commented 2 months ago

Codecov Report

Attention: Patch coverage is 0% with 163 lines in your changes missing coverage. Please review.

Project coverage is 91.90%. Comparing base (94a8096) to head (e97030f).

Files with missing lines Patch % Lines
pymc_marketing/mmm/MultiDimensionalMMM.py 0.00% 163 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #1036 +/- ## ========================================== - Coverage 95.59% 91.90% -3.69% ========================================== Files 39 40 +1 Lines 4064 4227 +163 ========================================== Hits 3885 3885 - Misses 179 342 +163 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

tim-mcwilliams commented 4 weeks ago

@cetagostini I've been experimenting with this new feature and came across a potential bug. When trying to use partial pulling across a geo dim, I am running into a broadcasting error.

Looking like its throwing that error when creating the channel_contributions var, specifically in the forward_pass function. Right now the function dims are set to only look at the "channel"

return second.apply(x=first.apply(x=x, dims="channel"), dims="channel")

However, modifying that to include the dims being passed to the VanillaMultiDimensionalMMM class like so

return second.apply(x=first.apply(x=x, dims=(*self.dims,"channel")), dims=(*self.dims,"channel"))

fixes the broadcasting error and I was able to fit the model from there.

Here's the full traceback - @wd60622

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[150], line 1
----> 1 mmm_fit = mmm.fit(
      2     X=region_model_data.drop(columns="units"),
      3     y=region_model_data.drop(columns=Xs)
      4 )

File [~/code/pymc-marketing/docs/source/notebooks/mmm/MultiDimensionalMMM.py:548](http://localhost:8888/lab/tree/docs/source/notebooks/mmm/~/code/pymc-marketing/docs/source/notebooks/mmm/MultiDimensionalMMM.py#line=547), in VanillaMultiDimensionalMMM.fit(self, X, y, progressbar, predictor_names, random_seed, **kwargs)
    545     predictor_names = []
    547 if not hasattr(self, "model"):
--> 548     self.build_model(X, y)
    550 # sampler_kwargs = create_sample_kwargs(
    551 #     self.sampler_config,
    552 #     progressbar,
    553 #     random_seed,
    554 #     **kwargs,
    555 # )
    556 with self.model:

File [~/code/pymc-marketing/docs/source/notebooks/mmm/MultiDimensionalMMM.py:431](http://localhost:8888/lab/tree/docs/source/notebooks/mmm/~/code/pymc-marketing/docs/source/notebooks/mmm/MultiDimensionalMMM.py#line=430), in VanillaMultiDimensionalMMM.build_model(self, X, y, **kwargs)
    426     pass
    428 else:
    429     channel_contributions = pm.Deterministic(
    430         name="channel_contributions",
--> 431         var=self.forward_pass(x=channel_data_),
    432         dims=("date", *self.dims, "channel"),
    433     )
    435 mu_var = intercept + channel_contributions.sum(axis=-1)
    437 if (
    438     self.control_columns is not None
    439     and len(self.control_columns) > 0
    440     and all(column in X.columns for column in self.control_columns)
    441 ):

File [~/code/pymc-marketing/docs/source/notebooks/mmm/MultiDimensionalMMM.py:284](http://localhost:8888/lab/tree/docs/source/notebooks/mmm/~/code/pymc-marketing/docs/source/notebooks/mmm/MultiDimensionalMMM.py#line=283), in VanillaMultiDimensionalMMM.forward_pass(self, x)
    259 """Transform channel input into target contributions of each channel.
    260 
    261 This method handles the ordering of the adstock and saturation
   (...)
    276 
    277 """
    278 first, second = (
    279     (self.adstock, self.saturation)
    280     if self.adstock_first
    281     else (self.saturation, self.adstock)
    282 )
--> 284 return second.apply(x=first.apply(x=x, dims="channel"), dims="channel")

File [/opt/anaconda3/envs/marketing_env/lib/python3.12/site-packages/pymc_marketing/mmm/components/base.py:555](http://localhost:8888/opt/anaconda3/envs/marketing_env/lib/python3.12/site-packages/pymc_marketing/mmm/components/base.py#line=554), in Transformation.apply(self, x, dims)
    522 def apply(self, x: pt.TensorLike, dims: Dims | None = None) -> TensorVariable:
    523     """Call within a model context.
    524 
    525     Used internally of the MMM to apply the transformation to the data.
   (...)
    553 
    554     """
--> 555     kwargs = self._create_distributions(dims=dims)
    556     return self.function(x, **kwargs)

File [/opt/anaconda3/envs/marketing_env/lib/python3.12/site-packages/pymc_marketing/mmm/components/base.py:315](http://localhost:8888/opt/anaconda3/envs/marketing_env/lib/python3.12/site-packages/pymc_marketing/mmm/components/base.py#line=314), in Transformation._create_distributions(self, dims)
    311     var = dist.create_variable(variable_name)
    312     return dim_handler(var, dist.dims)
    314 return {
--> 315     parameter_name: create_variable(parameter_name, variable_name)
    316     for parameter_name, variable_name in self.variable_mapping.items()
    317 }

File [/opt/anaconda3/envs/marketing_env/lib/python3.12/site-packages/pymc_marketing/mmm/components/base.py:312](http://localhost:8888/opt/anaconda3/envs/marketing_env/lib/python3.12/site-packages/pymc_marketing/mmm/components/base.py#line=311), in Transformation._create_distributions.<locals>.create_variable(parameter_name, variable_name)
    310 dist = self.function_priors[parameter_name]
    311 var = dist.create_variable(variable_name)
--> 312 return dim_handler(var, dist.dims)

File [/opt/anaconda3/envs/marketing_env/lib/python3.12/site-packages/pymc_marketing/prior.py:192](http://localhost:8888/opt/anaconda3/envs/marketing_env/lib/python3.12/site-packages/pymc_marketing/prior.py#line=191), in create_dim_handler.<locals>.func(x, dims)
    191 def func(x: pt.TensorLike, dims: Dims) -> pt.TensorVariable:
--> 192     return handle_dims(x, dims, desired_dims)

File [/opt/anaconda3/envs/marketing_env/lib/python3.12/site-packages/pymc_marketing/prior.py:182](http://localhost:8888/opt/anaconda3/envs/marketing_env/lib/python3.12/site-packages/pymc_marketing/prior.py#line=181), in handle_dims(x, dims, desired_dims)
    177 args = [
    178     "x" if missing else idx
    179     for (idx, missing) in zip(new_idx, missing_dims, strict=False)
    180 ]
    181 args = _remove_leading_xs(args)
--> 182 return x.dimshuffle(*args)

File [/opt/anaconda3/envs/marketing_env/lib/python3.12/site-packages/pytensor/tensor/variable.py:347](http://localhost:8888/opt/anaconda3/envs/marketing_env/lib/python3.12/site-packages/pytensor/tensor/variable.py#line=346), in _tensor_py_operators.dimshuffle(self, *pattern)
    345 if (len(pattern) == 1) and (isinstance(pattern[0], list | tuple)):
    346     pattern = pattern[0]
--> 347 op = pt.elemwise.DimShuffle(list(self.type.broadcastable), pattern)
    348 return op(self)

File [/opt/anaconda3/envs/marketing_env/lib/python3.12/site-packages/pytensor/tensor/elemwise.py:171](http://localhost:8888/opt/anaconda3/envs/marketing_env/lib/python3.12/site-packages/pytensor/tensor/elemwise.py#line=170), in DimShuffle.__init__(self, input_broadcastable, new_order)
    168             drop.append(i)
    169         else:
    170             # We cannot drop non-broadcastable dimensions
--> 171             raise ValueError(
    172                 "Cannot drop a non-broadcastable dimension: "
    173                 f"{input_broadcastable}, {new_order}"
    174             )
    176 # This is the list of the original dimensions that we keep
    177 self.shuffle = [x for x in new_order if x != "x"]

ValueError: Cannot drop a non-broadcastable dimension: [False, False], (0,)
wd60622 commented 3 weeks ago

Good catch @tim-mcwilliams So I am hearing you got it working with this fix, right?

@cetagostini You can do checks on the dims in the priors at initialization in order to catch errors earlier if needed

tim-mcwilliams commented 3 weeks ago

@wd60622 thanks! Correct, with the fix I was able to get the model working.