matheusfacure / python-causality-handbook

Causal Inference for the Brave and True. A light-hearted yet rigorous approach to learning about impact estimation and causality.
https://matheusfacure.github.io/python-causality-handbook/landing-page.html
MIT License
2.74k stars 475 forks source link

Chapter 18: Comparing predictive / causal and dependent-variable / elasticity at the same time #217

Open epenn-uber opened 2 years ago

epenn-uber commented 2 years ago

There is an issue on chapter 18, in the following paragraph The only problem is that prediction is not particularly useful here. Ultimately, we want to know when we can increase prices and when we can’t. But once we look at the slopes of the lines in the predictive model partitions, we see that they don’t change much. In other words, both partitions, as defined by the prediction model, have about the same responsiveness to price increase. This doesn’t offer us much insight into which are the days we can increase prices, since it looks like price is not affecting sales at all.

It should be the fair comparisons here are:

matheusfacure commented 2 years ago

Comparing changes in both dimensions attributes the differing segmentations to the architecture of the model rather than

the type of prediction. I agree with that, but the point here is to show this. I'm not comparing GB with regression, but rather using prediction vs using elasticity.

pmarkoo commented 1 year ago

First and foremost, thank you @matheusfacure for creating such an amazing and inspiring resource for Causal Inference!

I'm doubling down on the issue in Chapter 18, or rather constructive suggestions initiated by @epenn-uber. I'd almost started opening my issue when discovered an existing one.

Chapter 18 is an excellent introduction to Heterogeneous Treatment but, in my opinion, the train of thought is interrupted at the point when you introduce the Gradient Boosting model. The main idea and concept is somewhat lost at this point.

1) We started with the initial goal of distinguishing days with high price elasticity from days with low price elasticity. Before even creating the Gradient Boosting model, I expected a very simple check of elasticity prediction from the linear model. If we just made a quick Boxplot of elasticity predictions from the linear model, we would get this interesting graph:

elasticity

Days 1 and 7 show a significant difference in elasticity from the other days (approximate elasticity for $\epsilon=1$). For the rest of the days, elasticity is close to zero and those could be probably good candidates when we could charge more and have a lower risk of a drop in sales.

2) Then comes the part which is a bit unclear to me. Introduction of the segmentation based on sales predictions. Comparing group characteristics based on elasticity from one model and sales predictions from another model looks like a comparison of apples and oranges. Plus, do we even need a model for sales prediction, we could just use empirical sales values and eventually make segmentation if we would need it at all.

3) Let's have a look at the Boxplot of sales per day. The borderline between the two groups, based on ML sales predictions (value of 198.735 sales), is represented with a dashed, red line. We can see that the two groups are completely arbitrary in the context of days of the week, the lower group = [half of days 2, 3, 4, 5, 6, a couple of data points of days 1 and 7] and the upper group = [almost full day 1, half of the days 2, 3, 4, 5, 6, almost full day 7].

sep

On the other hand, when you look previous elasticity boxplot, two groups based on elasticity will have clear separation into the lower group = [full days 1 and 7] and the upper group = [full days 2, 3, 4, 5, 6]. This is to illustrate my point that we obviously compare apples and oranges.

4) Finally, I completely agree with @epenn-uber: it would be a fair comparison only if an elasticity decorator is applied to both models (grad boost and linear). I did the elasticity calculation with the Gradient Boosting model and here is the result:

boosting-elast

As you can see, it is qualitatively very similar to the results obtained with linear regression. Again, this confirms the conclusion that days 1 and 7 show a significant difference in elasticity from the other days and we shouldn't change the price on those days. Therefore, elasticity predictions are actually useful here.

Let me know if this all makes sense. Again, I've learned and still learning a lot from your book and I'm extremely grateful for your effort!

pmarkoo commented 1 year ago

@matheusfacure FYI

As additional support to the above, after checking how the dataset was generated, we conclude that higher elasticities on days 1 and 7 absolutely make sense and are totally expected and (probably) intentionally planned by the data creator:

elasticity