Propensity Score Estimation with Classifiers

MaximilianFranz commented 4 years ago

Hey there!

I was wondering about the best way to do propensity score estimation in the models that require it. I saw you are using an ElasticNetCV with clipping as a default. What were the considerations behind that decision? From what I know, using regressors for estimating class probabilities is tricky, unless it's a LogisticRegression (which seems to be part of the ElasticNet, right?).

When comparing the ElasticNetPropensityModel with LogisticRegression and calibrated RandomForestClassifier, I observed that the predictions are not as meaningful as they could/should be, because the probabilities are skewed. The plot shows how much the probability estimate of the method concurs with our intuition of the likelihood of that event occuring. Or as the folks at sklearn put it:

For instance, a well calibrated (binary) classifier should classify the samples such that among the samples to which it gave a predict_proba value close to 0.8, approximately 80% actually belong to the positive class.

I believe this is also what the propensity score should reflect. Thus, using their recommended ways for calibration seems reasonable.

calibration

For our own framework for model comparison, we're aiming to provide default propensity estimation as well, thus I looking forward to hear your thoughts on this!

Best, Max

jeongyoonlee commented 4 years ago

Thanks for your comment, @MaximilianFranz. We initially decided not to include full scale propensity score modeling in causalml because our focus is on uplift modeling and we think it's better to keep propensity estimation separately.

I'd like to learn your thoughts on what are some pros and cons of including propensity modeling as well. :)

That said, we just added calibrate() based on GAM to the propensity module in PR #80 because we noticed that some learners work better with well-calibrated propensity scores.

Thanks!

MaximilianFranz commented 4 years ago

Thanks for the reply! Cool to see the new calibrated version. It seems to work well, but requires the true treatment of the test-set in the case of this plot-experiment (see the green curve, compared to the orange one) :

Screenshot 2019-11-14 at 10 06 58

As to the question: I believe it is important to avoid mistakes in estimation due to the misuse of regressors or non-calibrated classifiers. I, for example, wasn't aware of the pitfalls and just 'threw some regressor' on the problem to get propensity estimates. So the major reason for including it is ease of use and providing a guideline on how to do it properly.

I think we'll stick to a minimal version providing Logistic regression and GAMs as well, for now!

uber / causalml

Propensity Score Estimation with Classifiers #82