Open drbenvincent opened 2 months ago
Closed by #1032 ?
Closed by #1032 ?
Think it would be better to include functions instead of notebooks
The only things I can think of would be:
pm.set_data
. A more general pm.do
would be useful when we want to convert a random variable node into an observed/data node (for example), and I suspect this is most useful when we have something structurally more complex than linear regression.
Do you think 2 would be something general enough and useful as a method?
I think so, yes. If we want to emphasise the causal inference capabilities of pymc-marketing then I think these kinds of plots are pretty canonical (if that's the right word).
We'd not assume people are doing parameter recovery, and tailor the plots to the typical use case of inferring causal impact.
One type of plot could focus on visualizing the actual and counterfactual data scenarios. That could be a useful sanity check and communication tool.
The other plot could focus on the causal impact estimation
This is perhaps a meta-proposal which may be fully addressed by multiple separate PR's. But the core idea is to build out the functionality (and visibility of that) on the causal inference side. So that will involve code and documentation changes.
And right now I am focussing on MMM's. CLV components could be dealt with later.
Add the
do
operatorWe should add a method to allow a user to intervene upon the DAG. I believe it is currently possible for a user to use
set_data
. Though I believe the user does not do this directly - see the example of evaluating out of sample predictions.The addition of the do operator would enable greater flexibility in the interventions that a user can apply to the causal DAG. Currently, doing this with
set_data
alone restricts users to updating the values of data nodes. But the do operator would additionally allow interventions on latent / random variables.Showcase counterfactual inference with MMM's
We should also showcase the causal inference capabilities of
pymc-marketing
. Though these capabilities require a relatively high degree of familiarity and custom coding. Below I outline an example use case. This could be used as the basis of new docs.GOAL: We have conducted a media campaign where we changed media spend in some channel (or region, or relating to a product) and we want to evaluate the causal impact of that intervention upon our outcome variable.
The algorithmic approach would be:
The plot below shows this process visually. In particular we have simulated the counterfactual scenario of 'business as usual'. We also simulate an 'actual' scenario which includes an intervention on media spend. In the plot below the orange lines represent 'actual' spend (top 2 panels) and outcome variable. The blue lines represent the spend in the 'actual' scenario. We can see that this is increased in the intervention period, the start of which is indicated by the red dashed lines. The outcome plot shows that there is some increase in the outcome variable caused by the increased spend on the x1 channel. The causal impact (the difference) is shown in the bottom plot.
The question is, how well can we recover the causal impact by running the algorithm specified above? The plot below summarises the attempt to do this with the causal inference capabilities of
pymc-marketing
:Top panel: Shows the observed outcomes (solid black line) along with the posterior predictive fit to the observed data (orange HDI's). The blue shaded HDI regions show what the model believes would have occurred under the counterfactual scenario. We can see slightly elevated sales. However there is some overlap in the HDI's of the actual and counterfactual scenarios.
Middle panel: Shows the causal impact - the difference between the posterior predictions under the actual and counterfactual scenarios. NOTE: It may be more sensible to compare the predictions under the counterfactual scenario to the empirically observed outcomes. We can see that the mean and HDI's indicate no meaningful causal impact in the pre-intervention period. However, during the intervention there is some evidence that the outcome variable is elevated.
Bottom panel: Shows the cumulative causal impact of the intervention upon the outcome variable from the point that the intervention begins and onwards. This indicates much greater certainty in a non-zero impact of the intervention.