'GRFTree' object has no attribute 'n_features_in_'

JFernandoGR commented 2 years ago

Hi.

I just installed 'EconML' and I have problems in calculating heterogeneous effects for each individual of my test-set database.

This is not a problem with my PC but with the library in general, as I am finding the sample problem testing the code in Google Colab and even after reinstalling all Anaconda:

CODE: causal_forest_model = CausalForestDML(discrete_treatment=True, model_t = RandomForestClassifier(random_state = 1234), model_y= RandomForestRegressor(random_state=1234, max_features = "sqrt", n_estimators = 1800,max_depth = 10, min_samples_leaf = 55), n_estimators = 1000) causal_forest_model.tune(y_train, t_train2, X=X_train_scaled) causal_forest_model.fit(y_train, t_train2, X=X_train_scaled) df_predict_random1['Control'] = pd.Series(causal_forest_model.effect(X_test_scaled, T1 = 0))

ATTRIBUTE ERROR:

----> 5 df_predict_random1['Control'] = pd.Series(causal_forest_model.effect(X_test_scaled, T1 = 0)) C:\ProgramData\Anaconda3\lib\site-packages\econml_cate_estimator.py in effect(self, X, T0, T1) 860 def effect(self, X=None, *, T0=0, T1=1): 861 # NOTE: don't explicitly expand treatments here, because it's done in the super call --> 862 return super().effect(X, T0=T0, T1=T1) 863 effect.doc = BaseCateEstimator.effect.doc 864

C:\ProgramData\Anaconda3\lib\site-packages\econml_cate_estimator.py in effect(self, X, T0, T1) 588 # TODO: what if input is sparse? - there's no equivalent to einsum, 589 # but tensordot can't be applied to this problem because we don't sum over m --> 590 eff = self.const_marginal_effect(X) 591 # if X is None then the shape of const_marginal_effect will be wrong because the number 592 # of rows of T was not taken into account

C:\ProgramData\Anaconda3\lib\site-packages\econml_ortho_learner.py in const_marginal_effect(self, X) 790 return self._ortho_learner_model_final.predict() 791 else: --> 792 return self._ortho_learner_model_final.predict(X) 793 const_marginal_effect.doc = LinearCateEstimator.const_marginal_effect.doc 794

C:\ProgramData\Anaconda3\lib\site-packages\econml\dml_rlearner.py in predict(self, X) 100 101 def predict(self, X=None): --> 102 return self._model_final.predict(X) 103 104 def score(self, Y, T, X=None, W=None, Z=None, nuisances=None, sample_weight=None, groups=None):

C:\ProgramData\Anaconda3\lib\site-packages\econml\dml\causal_forest.py in predict(self, X) 93 94 def predict(self, X): ---> 95 return self._model.predict(self._combine(X, fitting=False)).reshape((-1,) + self._d_y + self._d_t) 96 97 @property

C:\ProgramData\Anaconda3\lib\site-packages\econml\grf\classes.py in predict(self, X, interval, alpha) 44 return np.moveaxis(np.array(pred), 0, 1), np.moveaxis(np.array(lb), 0, 1), np.moveaxis(np.array(ub), 0, 1) 45 else: ---> 46 pred = [estimator.predict(X, interval=interval, alpha=alpha) for estimator in self.estimators_] 47 return np.moveaxis(np.array(pred), 0, 1) 48

C:\ProgramData\Anaconda3\lib\site-packages\econml\grf\classes.py in (.0) 44 return np.moveaxis(np.array(pred), 0, 1), np.moveaxis(np.array(lb), 0, 1), np.moveaxis(np.array(ub), 0, 1) 45 else: ---> 46 pred = [estimator.predict(X, interval=interval, alpha=alpha) for estimator in self.estimators_] 47 return np.moveaxis(np.array(pred), 0, 1) 48

C:\ProgramData\Anaconda3\lib\site-packages\econml\grf_base_grf.py in predict(self, X, interval, alpha) 836 lb[:, :self.n_relevantoutputs], ub[:, :self.n_relevantoutputs]) 837 else: --> 838 y_hat = self.predict_full(X, interval=False) 839 if self.n_relevantoutputs == self.noutputs: 840 return y_hat

C:\ProgramData\Anaconda3\lib\site-packages\econml\grf_base_grf.py in predict_full(self, X, interval, alpha) 804 ub[:, t] = pred_dist.ppf(1 - (alpha / 2)) 805 return point, lb, ub --> 806 return self._predict_point_and_var(X, full=True, point=True, var=False) 807 808 def predict(self, X, interval=False, alpha=0.05):

C:\ProgramData\Anaconda3\lib\site-packages\econml\grf_base_grf.py in _predict_point_and_var(self, X, full, point, var, project, projector) 685 """ 686 --> 687 alpha, jac = self.predict_alpha_and_jac(X) 688 invjac = np.linalg.pinv(jac) 689 parameter = np.einsum('ijk,ik->ij', invjac, alpha)

C:\ProgramData\Anaconda3\lib\site-packages\econml\grf_base_grf.py in predict_alpha_and_jac(self, X, slice, parallel) 621 check_is_fitted(self) 622 # Check data --> 623 X = self._validate_X_predict(X) 624 625 # Assign chunk of trees to jobs

C:\ProgramData\Anaconda3\lib\site-packages\econml\grf_base_grf.py in _validate_X_predict(self, X) 461 check_isfitted(self) 462 --> 463 return self.estimators[0]._validate_X_predict(X, check_input=True) 464 465 def predict_tree_average_full(self, X):

C:\ProgramData\Anaconda3\lib\site-packages\econml\tree_tree_classes.py in _validate_X_predict(self, X, check_input) 284 285 n_features = X.shape[1] --> 286 if self.n_featuresin != n_features: 287 raise ValueError("Number of features of the model must " 288 "match the input. Model n_features is %s and "

AttributeError: 'GRFTree' object has no attribute 'n_featuresin'

kbattocchi commented 2 years ago

Could you please include the output of pip list? It sounds like perhaps an incompatibility between the versions of sklearn and econml that you have installed.

JFernandoGR commented 2 years ago

Here you go the output of pip list:

scikit-image 0.18.3 scikit-learn 0.24.2 scikit-learn-intelex 2021.20210714.120553 dowhy 0.6 econml 0.13.0

JFernandoGR commented 2 years ago

@kbattocchi Thanks so much for giving me hints but is weird because last months library worked without any problem.

kbattocchi commented 2 years ago

Oddly, with scikit-learn 0.24.2 and econml 0.13 on my machine, I do not see this error, nor do I see it when running the following on Google Colab:

%pip install econml

from econml.dml import CausalForestDML
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
import numpy as np
import pandas as pd

y_train = np.random.normal(size=1000)
t_train2 = np.random.choice([0, 1], size=1000)
X_train_scaled = np.random.normal(size=(1000, 10))
X_test_scaled = np.random.normal(size=(1000, 10))
df_predict_random1 = pd.DataFrame()

causal_forest_model = CausalForestDML(discrete_treatment=True, model_t = RandomForestClassifier(random_state = 1234), model_y= RandomForestRegressor(random_state=1234, max_features = "sqrt", n_estimators = 1800,max_depth = 10, min_samples_leaf = 55), n_estimators = 1000)
causal_forest_model.tune(y_train, t_train2, X=X_train_scaled)
causal_forest_model.fit(y_train, t_train2, X=X_train_scaled)
df_predict_random1['Control'] = pd.Series(causal_forest_model.effect(X_test_scaled, T1 = 0))

Could you provide a simple repro that includes dummy data?

Also, if you're using Anaconda, could you also provide the output of conda list?

econml 0.13 should be compatible with both scikit-learn 1.0 and scikit-learn 0.24, while econml 0.12 was only compatible with the latter, but perhaps you've found a corner case we did not sufficiently test.

kbattocchi commented 2 years ago

It looks like when upgrading econml in-place some of our native components are (incorrectly) not getting rebuilt because of how we use cython.

Uninstalling and then reinstalling econml should be a viable workaround for now; sorry for the inconvenience and we'll try to get this fixed in an update soon. Thanks for bringing this to our attention!

JFernandoGR commented 2 years ago

Thanks so much for your answers!

py-why / EconML

'GRFTree' object has no attribute 'n_features_in_' #573