google / lightweight_mmm

LightweightMMM 🦇 is a lightweight Bayesian Marketing Mix Modeling (MMM) library that allows users to easily train MMMs and obtain channel attribution information.
https://lightweight-mmm.readthedocs.io/en/latest/index.html
Apache License 2.0
891 stars 191 forks source link

Budget Optimization - Predicted target value #237

Open ar-asur opened 1 year ago

ar-asur commented 1 year ago

Hi team - I am building a MMM model with 8 channels and 2 extra features. It is a geo model at US state level (51 geos). I used the adstock model to fit and did the sanity checks for convergence (prior/posterior distribution, r_hat close to (1-1.01), n_eff > 200). When I am trying to use the budget optimization function, the predicted target value for both pre-optimized and post-optimization are so different from actual values. I have attached the plot from the optimization function which gives values close to 9M while the actual value is around 600k.

Since my extra features are significant, I tried including and excluding the extra features but the results are similar. The model validation error MAPE is around 10-15% for all geos. I double-checked my scaler functions as well Has anyone faced similar issue, looking for suggestions where to investigate?

Screenshot 2023-08-22 at 5 06 13 PM

ar-asur commented 1 year ago

Following up on the issue,

I am looking into the _objective_function method in the optimize_media.py file. My setup is 51 geos, 8 channels. My understanding is that the following code should divide the given budget for the channels based on the historical data share. Then, it repeats it for the n_time_period. When I try to run the following code individually, I am not getting a repeated array

media_values = geo_ratio * jnp.expand_dims(media_values, axis=-1)
 media_values = jnp.tile(media_values / media_input_shape[0], reps=media_input_shape[0])
 # Distribute budget of each channels across time.
 media_values = jnp.reshape(a=media_values, newshape=media_input_shape)

When I change the above code to the following code, I am getting spend divided across the channels and repeated across n_time_periods.

  media_values = geo_ratio * jnp.expand_dims(media_values, axis=-1)
  media_values = jnp.tile(media_values / media_input_shape[0], reps=(media_input_shape[0],1,1))

Now I am getting prediction from optimization in the range of actual values. I am not sure if this is version issue or I am doing something incorrectly.

Any help would be appreciated. Thanks!

jorgemadridm19 commented 1 year ago

@ar-asur Can you share the code used for the optimization? I haven't been able to make it work.

rajat-barve commented 1 year ago

@ar-asur , sorry, I am not able to help you with your question since I am super new to MMM myself. But can you please let me know how you did your sanity checks and how did you learn about them? Any source?

ar-asur commented 1 year ago

@jorgemadridm19 The code in the example notebooks worked for me for both national and geo-level model. I used them as the starting point. When you say it doesn't work, do you get issues with the code or the results?

ar-asur commented 1 year ago

@rajat-barve For the sanity check, you can just search for MCMC output analysis/interpretation and use the articles/videos to understand the summary output. Other steps, I did was to compare prior/posterior distribution and check if they are same or model has learnt anything from the data... And for business validation, look at the attribution, lag weight parameter distribution and check if those makes sense.

S-YIN commented 1 year ago

I did the same thing. I believe it is a bug in this version of lightweightMMM. According to the jnp documentation, reps=media_input_shape[0] will automatically append 1's in front of media_input_shape[0], not after.