google / lightweight_mmm

LightweightMMM 🦇 is a lightweight Bayesian Marketing Mix Modeling (MMM) library that allows users to easily train MMMs and obtain channel attribution information.
https://lightweight-mmm.readthedocs.io/en/latest/index.html
Apache License 2.0
865 stars 177 forks source link

limitations in the use of extra features #273

Closed mims-b closed 10 months ago

mims-b commented 10 months ago

Hi I'm new to LMMM and modelling in general, I want to ask if there are limitations in what can be used as inputs for extra features (i.e. the control variables). For example can I have a column containing a continuous variable like "weekly average gas price" and then in another column have something like "holiday" which will be a binary variable and display 1 if there is a major holiday in that week and 0 if not. Thanks.

becksimpson commented 10 months ago

@mims-b The only limitation for extra features is that they are modelled linearly, with a single multiplier coefficient, that has a zero-centered normal prior.

Due to the fact that intercept, media effects are constrained to be strictly positive. I think it can be especially important to ensure you're extra features are zero-centered or zero-minned during pre-processing, essentially just not have a large positive bias. As the lowest a lot of these other effects (paid media, intercept) can be is zero, as they are strictly positve (modelled with HalfNormals etc). So unlike a more un-constrained regression problem, they cannot enable negative offsets, so let's say gas price varies 2700 --> 2900, and that 200 change caused a 20% change in target (~0.2), suddenly to capture that effect, you would have to create a positive bias, ~2.7, which couldn't possibly be offset, as there are no negative learned contributors to do so. So the impact of gas price variation could not be learned. As you probably know it is usually good to do this sort of pre-processing in any problem, it's just in this model form, we go from "this might worsen model convergence, fit, make it take longer etc)" to "there is no way the model could model this"

In addition extra features should not appear in the causal chain anywhere between advertisement and target impact, or you'll be cannibalising credit. An example would be your target is sales. A bad extra feature would be activity on your website, clicks, items added to basket, as a lot of these activities would have been driven by media advertisement up the chain. The two examples you gave are good, gas price and holiday.

mims-b commented 10 months ago

Hi @becksimpson, thanks for your message. I don't fully understand the contents of the second paragraph but this is my interpretation and please correct me where wrong (in the example below, I will use sales of ready-made meals as my target).

Thanks for your time.

becksimpson commented 10 months ago

Yes to the first two points. No sorry to the third, the control variables (extra features) can be modelled with positive or negative effect. They have Normal Prior distributions for their single learned parameter, which can take on a positive or negative value. What I meant was, if your control channel has a large offset from zero for their min value. The example I gave was gas prices that in the training set that only vary between 250 to 270. Then lets say the "true effect" of gas prices is positive. A multiplier of 0.01. That means the impact of the variation between 250 to 270 on the target would be +0.2 (0.01 x (270 - 250)). However your extra features min value is 250. That means if your model were to learn this true 0.01 multiplier. That would create a positive bias on the target predictions of (250 x 0.01) = 2.5, that applies to all datapoints. This isn't a hugee problem in unconstrained regression, the intercept would simply learn a negative value to compensate for this positive bias. However in MMM's we have a constrained intercept, it must be positive. Therefore your model couldn't learn the positive relationship between 250 -> 270 gas price variation and your target, because it couldn't compensate for the huge positive bias it would introduce to your target predictions, by adding (250 x 'learned parameter').

mims-b commented 10 months ago

Many thanks for the explanation, all clear now.