google / lightweight_mmm

LightweightMMM 🦇 is a lightweight Bayesian Marketing Mix Modeling (MMM) library that allows users to easily train MMMs and obtain channel attribution information.
https://lightweight-mmm.readthedocs.io/en/latest/index.html
Apache License 2.0
829 stars 172 forks source link

Budget Allocation Percentage breakdown by channel #294

Open DongHarry-Kang opened 6 months ago

DongHarry-Kang commented 6 months ago

Hi all -

I am currently using budget allocator from lightweight, however I noticed the Before optimization budget is not correct. Could somebody let me know why this happened? for my analysis, n_time_periods = 10

image

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

my actual channel spend breakdown is as below

Channel 0 | Channel 1 | Channel 2 | Channel 3 | Channel 4 | Channel 5 | Channel 6 | Channel 7 | Channel 8 | Channel 9 -- | -- | -- | -- | -- | -- | -- | -- | -- | -- 36.7% | 33.5% | 15.1% | 5.8% | 3.8% | 1.2% | 2.5% | 1.2% | 0.1% | 0.0%

entzyeung commented 5 months ago

Dear LMMM Team,

I want to express my sincere appreciation for the hard work you've dedicated to creating this remarkable model. It's truly impressive and has proven to be user-friendly and easy to install, unlike my previous attempt with the FB Robyn model, which I eventually abandoned due to its complexity. Discovering your model felt like a ray of sunshine after a cloudy day.

To cut to the chase, I've encountered a similar issue to that of @DongHarry-Kang. I've spent considerable time experimenting with various datasets and fine-tuning for two weeks. However, I consistently obtain similar results when calling the plot_pre_post_budget_allocation_comparison function.

I've included some sample outputs below:

Output 1

Output 2

I've thoroughly examined data quality using check_data_quality(), assessed variances with highlight_variances(), reviewed spend fractions with highlight_low_spend_fractions(), and analyzed variance inflation factors with highlight_high_vif_values(). The datasets appear to be in good shape and suitable for analysis. I even trained the best model with parameters obtained from Gridsearches. However, when it comes to the critical aspect of budget allocation, it seems there may be an issue.

Regardless of how I adjust the budget and cost parameters, the model consistently produces the same outcome: maintaining the same channel ratios while suggesting a higher budget allocation.

Allow me to illustrate this with an example.

Screenshot 2024-01-07 at 22 55 23

As you can see from the investigation, the model did suggest me different and higher budget while maintaining the same channel ratios. I am so confused that why the model not reallocating the budgets as the documentation said? and keep increasing our budget instead? is there any params I missed? if not, is there any way to fix it?

Here is my syntax when calling the optimizatin function:

Run optimization with the parameters of choice.

solution, kpi_without_optim, previous_media_allocation = optimize_media.find_optimal_budgets(
            n_time_periods= n_time_periods,
            media_mix_model= mmm,
            extra_features= extra_features_test[:n_time_periods],# it has been scaled.
            budget= budget,
            prices= prices,
            media_scaler = media_scaler,
            target_scaler = target_scaler,
            )

Plot out pre post optimization budget allocation and predicted target variable comparison.

plot.plot_pre_post_budget_allocation_comparison(media_mix_model=mmm, 
                                                kpi_with_optim=solution['fun'], 
                                                kpi_without_optim=kpi_without_optim,
                                                optimal_buget_allocation=optimal_buget_allocation, 
                                                previous_budget_allocation=previous_budget_allocation, 
                                                figure_size=(10,10))

Thank you in advance.

shubhamgupta568 commented 5 months ago

@entzyeung can you please share your code ? I too faced similar issue and I am able to resolve it.

entzyeung commented 5 months ago

@shubhamgupta568 thank you for dropping by, here is the link to the notebook. please let me know why was that and which pieces of code should be corrected. Appreciated!

shubhamgupta568 commented 5 months ago

@entzyeung I can see only 2 problems i.e the way you defined prices and budget. I am using below code to find price and budget. Please try this and let me know if this fix your issue. Note, here costs and media_data are unscaled values.

prices = costs / media_data.sum(axis=0) budget = jnp.sum(jnp.dot(prices, media_data.mean(axis=0)))* n_time_periods

entzyeung commented 5 months ago

Thank you @shubhamgupta568 for the solution, i am running the code again now. The first run gave me the same distribution, so i am running it again. WIll keep you posted. What caused the problem? is it the dtype of the media_data?

entzyeung commented 5 months ago

Hello @shubhamgupta568 , I have tried many times, and have made sure both costs, and media_data are unscaled jnp array. Here are the results:

Screenshot 2024-01-11 at 21 17 49 Screenshot 2024-01-11 at 21 18 07

As you can see, the budget allocation percentages are the same, just the total budget is inflated a bit. I was thinking if it was the data type problem. But I have ensured they are unscaled jnp array. So i really have no clue what is happening here. Do you have any idea? I don't mind to exchange our notebooks if you want to.

shubhamgupta568 commented 5 months ago

@entzyeung The changes in price and budget i suggested earlier, was to sync your prices and budget with original data. I think the reason you might be getting same distribution can be the upper and lower bound percentages. Can you try changing that too ? May be something like upper bound 2 and lower bound 0.5.

entzyeung commented 5 months ago

I haven't yet set it directly. I was just using default. In the predict function?

On Fri, 12 Jan 2024, 06:31 Shubham Gupta, @.***> wrote:

I think the reason you might be getting same distribution can be the upper and lower bound percentages. Can you try changing that too ? May be something like upper bound 2 and lower bound 0.5

— Reply to this email directly, view it on GitHub https://github.com/google/lightweight_mmm/issues/294#issuecomment-1888506979, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOQ4QA4JAZY5ZXBTNEKMYITYODKENAVCNFSM6AAAAABBEEUI46VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBYGUYDMOJXHE . You are receiving this because you were mentioned.Message ID: @.***>

shubhamgupta568 commented 5 months ago

@entzyeung Its part of find_optimal_budgets function. By default both of them are 0.2 image

entzyeung commented 5 months ago

Hello @shubhamgupta568 , here is my code:

solution, kpi_without_optim, previous_media_allocation = optimize_media.find_optimal_budgets(
    n_time_periods= n_time_periods,
    media_mix_model= mmm,
    extra_features= extra_features_test[:n_time_periods],#### transformed, and the forecasted period [:n_time_periods]
    budget= budget,
    prices= prices,
    media_scaler = media_scaler,
    target_scaler = target_scaler,

    seed= SEED,
    bounds_upper_pct = 0.05, 
    bounds_lower_pct = 0.05
    )

I tried multiple sets of upper and lower pct, all of them gave me the same percentage of distribution, but if i set the upper and lower smaller than 0.2, there would be a height differences between bars, while remaining the same pct. If i set the upper and lower bigger than 0.2, the height between bars would be exactly the same.

The graph below is bounds pct = 0.3 (upper bound 2 and lower bound 0.5 gives me exactly the same distributions, and same heights as the graph below too)

0 3

The graph below is bounds pct = 0.05

0 05