py-why / EconML

ALICE (Automated Learning and Intelligence for Causation and Economics) is a Microsoft Research project aimed at applying Artificial Intelligence concepts to economic decision making. One of its goals is to build a toolkit that combines state-of-the-art machine learning techniques with econometrics in order to bring automation to complex causal inference problems. To date, the ALICE Python SDK (econml) implements orthogonal machine learning algorithms such as the double machine learning work of Chernozhukov et al. This toolkit is designed to measure the causal effect of some treatment variable(s) t on an outcome variable y, controlling for a set of features x.
https://www.microsoft.com/en-us/research/project/alice/
Other
3.8k stars 714 forks source link

Verbose logging in LinearDML and SparseLinearDML #922

Open carl-offerfit opened 1 week ago

carl-offerfit commented 1 week ago

I'm trying to fit LinearDML and SparseLinearDML on a marketing data set with 350,000 examples, 25 real valued treatment variables, and 50 nuisance variables. The fitting takes a long time (and eventually gives warnings - more about that in another issue if I can't figure it out). For now my question is if there is any way to get logging? I understand there are multiple stages in the fitting process (multiple models to fit), and I don't know what takes so long. LinearDML and SparseLinearDML models don't seem to accept the verbose parameters listed in the DML base class. Thanks for your help!

carl-offerfit commented 6 days ago

I stepped through the code and answered my own question: I can see there is no additional info available. I am going to work on enhancing the logging to meet my needs, and if it works out I can contribute to the project. I am dealing with non-convergence of the final model and I will most likely start with logging more diagnostics on the quality of the fit in the nuisance and treatment models.

kbattocchi commented 6 days ago

Hi Carl, I agree that keeping track of the progress through fitting is functionality that we should build into the library and have started working on that.

In terms of the quality of the fit, once fitting is complete one measure of that is accessible via nuisance_scores_t and nuisance_scores_y attributes, but note that interpretation of these can be somewhat subtle (you want your model to fit as well as possible, but for the DML techniques to work there needs to be unpredicatble variation in the treatment, which should lead to unpredictable variation in the outcome if the effect is nonzero).

carl-offerfit commented 6 days ago

Hi, thanks for your reply. I was thinking about making a branch and adding more detailed and familiar diagnostics on the first stage models. For example, a confusion matrix on the first stage outcome model (in case it is binary) or other intuitive evaluation metrics like correlation coefficient (for real valued outcomes or treatments); also I would like to look at SHAP plots for the first stage models (I have a good idea what the influential features should be, and if the model did not find them I would know something is wrong) etc. Does that make sense? And if I made such features with a flag to turn them on/off is that something other people would find useful?

There is lots of unpredictable variation in the treatments - we pick the treatments with contextual bandits which are refit multiple times per week on noisy data, plus we always have a small percent of completely random treatment assignment to help learning (we are using a version of epsilon greedy contextual bandits.)