pymc-labs / pymc-marketing

Bayesian marketing toolbox in PyMC. Media Mix (MMM), customer lifetime value (CLV), buy-till-you-die (BTYD) models and more.
https://www.pymc-marketing.io/
Apache License 2.0
700 stars 194 forks source link

budget allocation with no control vars #1030

Open wd60622 opened 1 month ago

wd60622 commented 1 month ago

Discussed in https://github.com/pymc-labs/pymc-marketing/discussions/1028

Originally posted by **bella0715** September 12, 2024 Here's how I built the model. ``` mmm = DelayedSaturatedMMM( model_config = mmm_config, sampler_config = sampler_config, date_column = date_var, channel_columns = spend_vars, control_columns = None, adstock_max_lag=8, yearly_seasonality=1, ) ``` When I try to run the below code to do the budget allocation, I get an error 'UnboundLocalError: cannot access local variable '_controls' where it is not associated with a value'. How do I run the budget allocation without control_columns? ``` response = mmm.allocate_budget_to_maximize_response( budget=total_budget, num_days=8, time_granularity="weekly", budget_bounds=budget_bounds, ) ```
wd60622 commented 1 month ago

It seems like part of the problem: https://github.com/pymc-labs/pymc-marketing/blob/936270958f79fd3cce0488ae2301f2d8f3e2a35f/pymc_marketing/mmm/mmm.py#L2074-L2077

AlfredoJF commented 1 month ago

Curious about what would be an optimal solution from the experts. In the meanwhile, sharing my workaround for this issue where I added a few new methods to the MMM class, extended others, and used historical control variables plus random noise using the same approach as in _create_synth_dataset.

from dateutil.relativedelta import relativedelta

start_date_comparison = last_date - relativedelta(years=1)

- Added `start_date_comparison` and `df_train_comparison` to method `_create_synth_dataset`, and added the workaround logic:
```python
    def _create_synth_dataset(
        self,
        ...
        start_date_comparison: datetime | str | None = None,
        df_train_comparison: pd.DataFrame | None = None,
    ) -> pd.DataFrame:
        """Create a synthetic dataset based on the given allocation strategy (Budget) and time granularity.

        Parameters
        ----------
        ...
        start_date_comparison : datetime | str | None
            A date from the synthetic dataset will be created.
        df_train_comparison : pd.DataFrame | None
            A dataframe from a previous year from the train dataset
        """
        ...

        if start_date_comparison is not None:
            last_date = pd.to_datetime(start_date_comparison).tz_localize(None)
        else:
            last_date = pd.to_datetime(df[date_column]).max()  # ln:2079

        ...

        new_rows = [
            ...
        ]  # ln:2108

        # Add historical control variables plus random noise
        if df_train_comparison is not None:

            synth_dataset = pd.DataFrame(new_rows)

            for control in self.control_columns:
                synth_dataset[control] = [value + np.random.normal(0, noise_level * value)
                                          for value in df_train_comparison[control].to_list()]

            return synth_dataset

        else:
            return pd.DataFrame(new_rows)  # ln: 2110

Hope this is not too convoluted and self-explanatory.

Happy to hear your thoughts and optimal solution.

wd60622 commented 1 week ago

Seems like a pretty good workaround. Would you want to make a PR for this @AlfredoJF?

Seems like a simple edit around these lines might do the trick:

https://github.com/pymc-labs/pymc-marketing/blob/af946dfa8687a65018ddd4d708f434a7f32f30ab/pymc_marketing/mmm/mmm.py#L2091-L2094