py-why / EconML

ALICE (Automated Learning and Intelligence for Causation and Economics) is a Microsoft Research project aimed at applying Artificial Intelligence concepts to economic decision making. One of its goals is to build a toolkit that combines state-of-the-art machine learning techniques with econometrics in order to bring automation to complex causal inference problems. To date, the ALICE Python SDK (econml) implements orthogonal machine learning algorithms such as the double machine learning work of Chernozhukov et al. This toolkit is designed to measure the causal effect of some treatment variable(s) t on an outcome variable y, controlling for a set of features x.
https://www.microsoft.com/en-us/research/project/alice/
Other
3.83k stars 716 forks source link

Setting the regularization parameter optimally #254

Open federiconuta opened 4 years ago

federiconuta commented 4 years ago

Hi and thank you for the improvements!

I am facing the following error, though setting the number of splits optimally according to issue #94. In particular, the warning appearing about alpha is:

Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems. As a consequence of the error, I tried to increase the maximum number of iterations as follows:

from econml.dml import SparseLinearDMLCateEstimator est = SparseLinearDMLCateEstimator(model_y = LassoCV(cv=[(fold00, fold11), (fold11, fold00)]), model_t = MultiTaskLassoCV(cv=[(fold00, fold11), (fold11, fold00)]), n_splits = [(fold0, fold1), (fold1, fold0)], linear_first_stages=False, featurizer=PolynomialFeatures(degree=4, include_bias=False), max_iter=100000)

But still the issue remains. Maybe the cross-validation in first and final stages should be left automatic?

Thank you,

Federico

kbattocchi commented 4 years ago

It might help if you could say a bit more about your data (e.g., what are the dimensions of Y, T, X, and W?). One thing that jumps out is that polynomial features of degree 4 might be generating quite a large number of features if X has many columns (there will be roughly n^4/24 featurized columns if X has n columns, which is then be multiplied by the number of treatments), which could naturally make the optimization problem more difficult.

federiconuta commented 4 years ago

Thank you for the kind reply,

So I removed the polynomial option already. The problem seems to me in model_y and model_t. Specifically in max_iter, tolerance and l1_ratio parameters. My data have the following dimensions: Y.shape = (576,),T.shape = (576,2), X.shape = (576, 12) (here I control for fixed effects) and finally W.shape = (576, 65), so a quite large set of controls in W. I was wondering if there is the possibility to set different alphas in the alphas parameter of model_y and model_t and visualize a test, but I did not find a way by now.

Thanks again

federiconuta commented 4 years ago

Maybe to give you more context, I am trying the following:

est = SparseLinearDMLCateEstimator(model_y = ElasticNetCV(cv=[(fold00, fold11), (fold11, fold00)]), model_t = MultiTaskElasticNetCV(cv=[(fold00, fold11), (fold11, fold00)]), n_splits = [(fold0, fold1), (fold1, fold0)], linear_first_stages=False, max_iter=100000)

where folds have been copied from those in issue #94 . Then I am doing:

est.fit(Y, T, X[:, :(n_products)], W, inference='debiasedlasso'). My data are basically data on prices and quantities in 12 UE countries observed in 48 quarters from 1996 to 2008. ln prices, ln quantities and other variables have shape: (12, 48)

kbattocchi commented 4 years ago

If the problem is in the first stages, then it shouldn't be affected by the featurization. In that case, since you've specified linear_first_stages=False, we're basically just regressing T and Y on concat([X, W]) (except that we're using subsets of each array based on the outer and inner folds). ElasticNetCV does support an argument alphas to its initializer to constrain the values to a fixed set, so you could give that a shot.

federiconuta commented 4 years ago

Ok. I see.

So what I can do is set a list of alphas and loop over such alphas as in:


alpha = np.logspace(-5, 1, 20)
train_errors = list()
test_errors = list()
est_UE_t = {}
for alpha in alphas:
    est_UE_t[alpha] = SparseLinearDMLCateEstimator(model_y = ElasticNetCV(cv=[(fold00, fold11),      (fold11, fold00)], max_iter=10000,alphas = alpha),
                            model_t = MultiTaskElasticNetCV(cv=[(fold00, fold11), (fold11, fold00)], max_iter=10000,alphas = alpha),
                            n_splits = [(fold0, fold1), (fold1, fold0)],
                            linear_first_stages=False,
                            max_iter=10000)

est_UE_t[alpha].fit(Y, T, X[:, :(n_products)],  W, inference='debiasedlasso')
train_errors.append(est_UE_t[alpha].score(Y, T, Xl[:, :(n_products)],  Wl))
    test_errors.append(enet.score(X_test, y_test))
```?

If so, is there a way to select the train and test units chosen by the inner and outer folds for Y,  concat([X, W]) and T?
kbattocchi commented 4 years ago

Sorry, now I'm a bit confused about what you're trying to accomplish - I thought you wanted to constrain the cross validation to select among a particular fixed set of alphas.

The 'CV' part of ElasticNetCV is all about using cross validation to pick the best alpha from some set (either an array specified by the alphas argument, or a default set generated by sklearn); if you're going to set just one alpha inside a loop through a fixed set of values yourself then you might as well use a plain ElasticNet (and then you don't need to worry about the inner cross validation folds at all, because there is no inner cross validation). But then this is like doing the cross validation yourself instead of letting sklearn handle it.

The point of Vasilis's example code for setting up fold0, fold1, fold00, and fold01 was to ensure that the products and months are distributed correctly in both the inner and outer folds, so you shouldn't have to do anything else to change this, assuming your data has the same general format as what was used in his example; if you don't perform cross validation inside model_y or model_t, then you only need the outer folds (fold0 and fold1).

Also, stepping back a bit, it's possible that the "Objective did not converge" message can simply be ignored; one reason this might happen is that a subset of the alpha values that is attempted during cross validation might not lead to good results, but if the optimization does converge for other values of alpha then presumably they would be chosen by the cross validation procedure because they'll have better out-of-sample performance (or maybe the coefficients from one of the non-converged attemtps still perform well enough out-of-sample to beat them, which is okay too).

federiconuta commented 4 years ago

I see. So thank you a lot. Yes. That is exactly what. wanted to accomplish