Closed mims-b closed 10 months ago
@mims-b The only limitation for extra features is that they are modelled linearly, with a single multiplier coefficient, that has a zero-centered normal prior.
Due to the fact that intercept, media effects are constrained to be strictly positive. I think it can be especially important to ensure you're extra features are zero-centered or zero-minned during pre-processing, essentially just not have a large positive bias. As the lowest a lot of these other effects (paid media, intercept) can be is zero, as they are strictly positve (modelled with HalfNormals etc). So unlike a more un-constrained regression problem, they cannot enable negative offsets, so let's say gas price varies 2700 --> 2900, and that 200 change caused a 20% change in target (~0.2), suddenly to capture that effect, you would have to create a positive bias, ~2.7, which couldn't possibly be offset, as there are no negative learned contributors to do so. So the impact of gas price variation could not be learned. As you probably know it is usually good to do this sort of pre-processing in any problem, it's just in this model form, we go from "this might worsen model convergence, fit, make it take longer etc)" to "there is no way the model could model this"
In addition extra features should not appear in the causal chain anywhere between advertisement and target impact, or you'll be cannibalising credit. An example would be your target is sales. A bad extra feature would be activity on your website, clicks, items added to basket, as a lot of these activities would have been driven by media advertisement up the chain. The two examples you gave are good, gas price and holiday.
Hi @becksimpson, thanks for your message. I don't fully understand the contents of the second paragraph but this is my interpretation and please correct me where wrong (in the example below, I will use sales of ready-made meals as my target).
Thanks for your time.
Yes to the first two points. No sorry to the third, the control variables (extra features) can be modelled with positive or negative effect. They have Normal Prior distributions for their single learned parameter, which can take on a positive or negative value. What I meant was, if your control channel has a large offset from zero for their min value. The example I gave was gas prices that in the training set that only vary between 250 to 270. Then lets say the "true effect" of gas prices is positive. A multiplier of 0.01. That means the impact of the variation between 250 to 270 on the target would be +0.2 (0.01 x (270 - 250)). However your extra features min value is 250. That means if your model were to learn this true 0.01 multiplier. That would create a positive bias on the target predictions of (250 x 0.01) = 2.5, that applies to all datapoints. This isn't a hugee problem in unconstrained regression, the intercept would simply learn a negative value to compensate for this positive bias. However in MMM's we have a constrained intercept, it must be positive. Therefore your model couldn't learn the positive relationship between 250 -> 270 gas price variation and your target, because it couldn't compensate for the huge positive bias it would introduce to your target predictions, by adding (250 x 'learned parameter').
Many thanks for the explanation, all clear now.
Hi I'm new to LMMM and modelling in general, I want to ask if there are limitations in what can be used as inputs for extra features (i.e. the control variables). For example can I have a column containing a continuous variable like "weekly average gas price" and then in another column have something like "holiday" which will be a binary variable and display 1 if there is a major holiday in that week and 0 if not. Thanks.