py-why / EconML

ALICE (Automated Learning and Intelligence for Causation and Economics) is a Microsoft Research project aimed at applying Artificial Intelligence concepts to economic decision making. One of its goals is to build a toolkit that combines state-of-the-art machine learning techniques with econometrics in order to bring automation to complex causal inference problems. To date, the ALICE Python SDK (econml) implements orthogonal machine learning algorithms such as the double machine learning work of Chernozhukov et al. This toolkit is designed to measure the causal effect of some treatment variable(s) t on an outcome variable y, controlling for a set of features x.
https://www.microsoft.com/en-us/research/project/alice/
Other
3.65k stars 690 forks source link

'GRFTree' object has no attribute 'n_features_in_' #573

Open JFernandoGR opened 2 years ago

JFernandoGR commented 2 years ago

Hi.

I just installed 'EconML' and I have problems in calculating heterogeneous effects for each individual of my test-set database.

This is not a problem with my PC but with the library in general, as I am finding the sample problem testing the code in Google Colab and even after reinstalling all Anaconda:

CODE: causal_forest_model = CausalForestDML(discrete_treatment=True, model_t = RandomForestClassifier(random_state = 1234), model_y= RandomForestRegressor(random_state=1234, max_features = "sqrt", n_estimators = 1800,max_depth = 10, min_samples_leaf = 55), n_estimators = 1000) causal_forest_model.tune(y_train, t_train2, X=X_train_scaled) causal_forest_model.fit(y_train, t_train2, X=X_train_scaled) df_predict_random1['Control'] = pd.Series(causal_forest_model.effect(X_test_scaled, T1 = 0))

ATTRIBUTE ERROR:

----> 5 df_predict_random1['Control'] = pd.Series(causal_forest_model.effect(X_test_scaled, T1 = 0)) C:\ProgramData\Anaconda3\lib\site-packages\econml_cate_estimator.py in effect(self, X, T0, T1) 860 def effect(self, X=None, *, T0=0, T1=1): 861 # NOTE: don't explicitly expand treatments here, because it's done in the super call --> 862 return super().effect(X, T0=T0, T1=T1) 863 effect.doc = BaseCateEstimator.effect.doc 864

C:\ProgramData\Anaconda3\lib\site-packages\econml_cate_estimator.py in effect(self, X, T0, T1) 588 # TODO: what if input is sparse? - there's no equivalent to einsum, 589 # but tensordot can't be applied to this problem because we don't sum over m --> 590 eff = self.const_marginal_effect(X) 591 # if X is None then the shape of const_marginal_effect will be wrong because the number 592 # of rows of T was not taken into account

C:\ProgramData\Anaconda3\lib\site-packages\econml_ortho_learner.py in const_marginal_effect(self, X) 790 return self._ortho_learner_model_final.predict() 791 else: --> 792 return self._ortho_learner_model_final.predict(X) 793 const_marginal_effect.doc = LinearCateEstimator.const_marginal_effect.doc 794

C:\ProgramData\Anaconda3\lib\site-packages\econml\dml_rlearner.py in predict(self, X) 100 101 def predict(self, X=None): --> 102 return self._model_final.predict(X) 103 104 def score(self, Y, T, X=None, W=None, Z=None, nuisances=None, sample_weight=None, groups=None):

C:\ProgramData\Anaconda3\lib\site-packages\econml\dml\causal_forest.py in predict(self, X) 93 94 def predict(self, X): ---> 95 return self._model.predict(self._combine(X, fitting=False)).reshape((-1,) + self._d_y + self._d_t) 96 97 @property

C:\ProgramData\Anaconda3\lib\site-packages\econml\grf\classes.py in predict(self, X, interval, alpha) 44 return np.moveaxis(np.array(pred), 0, 1), np.moveaxis(np.array(lb), 0, 1), np.moveaxis(np.array(ub), 0, 1) 45 else: ---> 46 pred = [estimator.predict(X, interval=interval, alpha=alpha) for estimator in self.estimators_] 47 return np.moveaxis(np.array(pred), 0, 1) 48

C:\ProgramData\Anaconda3\lib\site-packages\econml\grf\classes.py in (.0) 44 return np.moveaxis(np.array(pred), 0, 1), np.moveaxis(np.array(lb), 0, 1), np.moveaxis(np.array(ub), 0, 1) 45 else: ---> 46 pred = [estimator.predict(X, interval=interval, alpha=alpha) for estimator in self.estimators_] 47 return np.moveaxis(np.array(pred), 0, 1) 48

C:\ProgramData\Anaconda3\lib\site-packages\econml\grf_base_grf.py in predict(self, X, interval, alpha) 836 lb[:, :self.n_relevantoutputs], ub[:, :self.n_relevantoutputs]) 837 else: --> 838 y_hat = self.predict_full(X, interval=False) 839 if self.n_relevantoutputs == self.noutputs: 840 return y_hat

C:\ProgramData\Anaconda3\lib\site-packages\econml\grf_base_grf.py in predict_full(self, X, interval, alpha) 804 ub[:, t] = pred_dist.ppf(1 - (alpha / 2)) 805 return point, lb, ub --> 806 return self._predict_point_and_var(X, full=True, point=True, var=False) 807 808 def predict(self, X, interval=False, alpha=0.05):

C:\ProgramData\Anaconda3\lib\site-packages\econml\grf_base_grf.py in _predict_point_and_var(self, X, full, point, var, project, projector) 685 """ 686 --> 687 alpha, jac = self.predict_alpha_and_jac(X) 688 invjac = np.linalg.pinv(jac) 689 parameter = np.einsum('ijk,ik->ij', invjac, alpha)

C:\ProgramData\Anaconda3\lib\site-packages\econml\grf_base_grf.py in predict_alpha_and_jac(self, X, slice, parallel) 621 check_is_fitted(self) 622 # Check data --> 623 X = self._validate_X_predict(X) 624 625 # Assign chunk of trees to jobs

C:\ProgramData\Anaconda3\lib\site-packages\econml\grf_base_grf.py in _validate_X_predict(self, X) 461 check_isfitted(self) 462 --> 463 return self.estimators[0]._validate_X_predict(X, check_input=True) 464 465 def predict_tree_average_full(self, X):

C:\ProgramData\Anaconda3\lib\site-packages\econml\tree_tree_classes.py in _validate_X_predict(self, X, check_input) 284 285 n_features = X.shape[1] --> 286 if self.n_featuresin != n_features: 287 raise ValueError("Number of features of the model must " 288 "match the input. Model n_features is %s and "

AttributeError: 'GRFTree' object has no attribute 'n_featuresin'

kbattocchi commented 2 years ago

Could you please include the output of pip list? It sounds like perhaps an incompatibility between the versions of sklearn and econml that you have installed.

JFernandoGR commented 2 years ago

Here you go the output of pip list:

scikit-image 0.18.3 scikit-learn 0.24.2 scikit-learn-intelex 2021.20210714.120553 dowhy 0.6 econml 0.13.0

JFernandoGR commented 2 years ago

@kbattocchi Thanks so much for giving me hints but is weird because last months library worked without any problem.

kbattocchi commented 2 years ago

Oddly, with scikit-learn 0.24.2 and econml 0.13 on my machine, I do not see this error, nor do I see it when running the following on Google Colab:

%pip install econml

from econml.dml import CausalForestDML
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
import numpy as np
import pandas as pd

y_train = np.random.normal(size=1000)
t_train2 = np.random.choice([0, 1], size=1000)
X_train_scaled = np.random.normal(size=(1000, 10))
X_test_scaled = np.random.normal(size=(1000, 10))
df_predict_random1 = pd.DataFrame()

causal_forest_model = CausalForestDML(discrete_treatment=True, model_t = RandomForestClassifier(random_state = 1234), model_y= RandomForestRegressor(random_state=1234, max_features = "sqrt", n_estimators = 1800,max_depth = 10, min_samples_leaf = 55), n_estimators = 1000)
causal_forest_model.tune(y_train, t_train2, X=X_train_scaled)
causal_forest_model.fit(y_train, t_train2, X=X_train_scaled)
df_predict_random1['Control'] = pd.Series(causal_forest_model.effect(X_test_scaled, T1 = 0))

Could you provide a simple repro that includes dummy data?

Also, if you're using Anaconda, could you also provide the output of conda list?

econml 0.13 should be compatible with both scikit-learn 1.0 and scikit-learn 0.24, while econml 0.12 was only compatible with the latter, but perhaps you've found a corner case we did not sufficiently test.

kbattocchi commented 2 years ago

It looks like when upgrading econml in-place some of our native components are (incorrectly) not getting rebuilt because of how we use cython.

Uninstalling and then reinstalling econml should be a viable workaround for now; sorry for the inconvenience and we'll try to get this fixed in an update soon. Thanks for bringing this to our attention!

JFernandoGR commented 2 years ago

Thanks so much for your answers!