Closed robotenique closed 1 year ago
Hi @robotenique
You might find some ideas in the paper by Künzel et al. (2019): Meta-learners for Estimating Heterogeneous Treatment Effects using Machine Learning.
The authors argue that
1) the S-Learner can perform poorly if the base algorithm is lasso or random forest/decision tree because both algorithms can ignore the treatment variable. On the other side this can also be beneficial if the CATE is 0 very often (compare p.6)
2) for the T-Learner it is hard to find the underlying pattern of the treatment indicator because we treat treatment and control group separately (I guess it is hard for any base algorithm whether we use linear models or tree based models). (compare p. 6)
However, I could hardly find any scenarios in which one Meta Learner always performs better than the other. It always comes down to testing multiple approaches!
Makes sense. I was reading this blogpost here: https://doordash.engineering/2020/09/18/causal-modeling-to-get-more-value-from-flat-experiment-results/ And they argue this:
For some base-learners, the S-learner’s estimate of the treatment effect can be biased toward zero when the influence of the treatment T on the outcome Y is small relative to the influence of the attributes X. This is particularly pronounced in the case of tree-based models, as in this circumstance the learner may rarely choose to split on T. The S-learner is generally a good choice when the average treatment effect is large, the data set is small and the interpretability of the result is important.
vs
The T-learner is a good choice when using tree-based base-learners that have a small treatment effect that may not be well-estimated by an S-learner.
Which I found really interesting, but didn't understood the theoretical aspects that led to this conclusion
@jroessler Another thing, do you have any ref. on T-learner and S-learner for continuous treatment?
Hi, I have a newbie question.
When would we prefer to use T-Learner instead of an S-Learner to estimate the CATE? I'm thinking in the scenario of randomized experiment (given some confounders which we know of).
I read somewhere that S-learners might perform poorly if we use a linear model, but since we can use something like a tree or gradient boosting, what are the real advantages of using a T-Learner instead?
If anyone has some thoughts or research reference I'd love to check it out!
Thanks