py-why / EconML

ALICE (Automated Learning and Intelligence for Causation and Economics) is a Microsoft Research project aimed at applying Artificial Intelligence concepts to economic decision making. One of its goals is to build a toolkit that combines state-of-the-art machine learning techniques with econometrics in order to bring automation to complex causal inference problems. To date, the ALICE Python SDK (econml) implements orthogonal machine learning algorithms such as the double machine learning work of Chernozhukov et al. This toolkit is designed to measure the causal effect of some treatment variable(s) t on an outcome variable y, controlling for a set of features x.
https://www.microsoft.com/en-us/research/project/alice/
Other
3.75k stars 711 forks source link

Problem with DeepIV and tensorflow #537

Open juandavidgutier opened 2 years ago

juandavidgutier commented 2 years ago

Hello @kbattocchi,

I am working with DeepIV and I am reproducing the code available in (https://towardsdatascience.com/causal-ml-for-data-science-deep-learning-with-instrumental-variables-96e5b7cc0482) according to the issue (#352) I installed tensorflow 2.2. However, my script give me the error: ImportError: keras and tensorflow are no longer dependencies of the main econml package; install econml[tf] or econml[all] to require them, or install them separately, to use DeepIV.

Unfortunately I can not understand well the message error to solve the problem, could you share me an example with the code lines necessaries to solve it?

I appreciate a lot the cooperation

kbattocchi commented 2 years ago

Could you please include the output of pip list, as well as letting us know the version of python you're using?

That error message means that if you want to use DeepIV, you should either run pip install econml[tf] or pip install the plain econml package and separately install tensorflow<2.3, rather than just pip install econml. However, if you've already installed tensorflow 2.2 then just pip installing econml should be fine, so perhaps our logic for showing that error message is wrong and actually there's some other problem, which I'd love to track down.

juandavidgutier commented 2 years ago

Hello @kbattocchi,

Thanks for your answer. I have python 3.8 and here is the output of pip list: modules.csv

kbattocchi commented 2 years ago

I think the issue is that your version of keras is higher than we support; could you try installing keras<2.4 and see if that unblocks you?

juandavidgutier commented 2 years ago

@kbattocchi, I installed keras 2.3 (pip install keras==2.3), but unfortunately I get the same error message: "ImportError: keras and tensorflow are no longer dependencies of the main econml package; install econml[tf] or econml[all] to require them, or install them separately, to use DeepIV"

kbattocchi commented 2 years ago

I believe that this is the code that we're trying and failing to call that is resulting in that message:

import keras
from keras import backend as K
import keras.layers as L
from keras.models import Model

Could you try running this code and seeing if it produces a more meaningful error message?

juandavidgutier commented 2 years ago

@kbattocchi, I change the lines of keras as you suggested, but unfortunately the script produces the same error. I do not know what error I have in the script, here is my code:

`import numpy as np import matplotlib.pyplot as plt import pandas as pd import econml from econml.iv.nnet import DeepIV import keras from keras import backend as K import keras.layers as L from keras.models import Model

load stata data and rename columns

df = pd.read_csv('D:/clases/UDES/DeepIV/hard_traveling_dataset.csv')

df.rename(columns={'oe_bright_30': 'obstruction', 'oe_lf_1_bright30': 'protection', 'iv_bright_30': 'iv_obstruction', 'iv_lf_1_bright30': 'iv_protection'}, inplace=True)

normalize variables and change inf/nan to 0

for var in ['obstruction', 'protection', 'iv_obstruction', 'iv_protection']: df[var] = df[var]/df[var].mean()

df.replace(np.inf, 0, inplace=True) df.replace(np.nan, 0, inplace=True)

dumies

governoratedummies = [f"g{i}" for i in range(0, 11)] checkpointdummies = [f"checkpoint{i}" for i in range(1, 11)] partial_checkpointdummies = [f"partialcheckpoint{i}" for i in range(1, 11)] roadgatedummies = [f"roadgate{i}" for i in range(1, 11)] greenlinecheckpointdummies = [f"greenlinecheckpoint{i}" for i in range(1, 11)] earthmounddummies = [f"earthmound{i}" for i in range(1, 11)] settle_dummies = [f"settlein{i}km" for i in range(1000, 11000, 1000)]

all_dummies = governorate_dummies + checkpoint_dummies + partial_checkpoint_dummies + roadgate_dummies + greenlinecheckpoint_dummies + earthmound_dummies + settle_dummies

split df by "population_total" seperating the small peripheral neighbourhoods (per) from the larger more central neighbourhoods

df_per = df[df['population_total']<=1884] df_not_per = df[df['population_total']>=1885]

print(len(df_per), len(df_not_per))

set variables for peripheral neighbourhoods: outcome (y), treatment (t), covariates (x), instruments (z) and convert to arrays

y = (df_per['chng_employment']).to_numpy() t = (df_per[['obstruction', 'protection']]).to_numpy() x = (df_per[all_dummies]).to_numpy() z = (df_per[['iv_obstruction', 'iv_protection']]).to_numpy()

set variables for central neighbourhoods: outcome (y2), treatment (t2), covariates (x2), instruments (z2) and convert to arrays

y2 = (df_not_per['chng_employment']).to_numpy() t2 = (df_not_per[['obstruction', 'protection']]).to_numpy() x2 = (df_not_per[all_dummies]).to_numpy() z2 = (df_not_per[['iv_obstruction', 'iv_protection']]).to_numpy()

deep neural s

treatment_ = keras.Sequential([L.Dense(128, activation='relu', input_shape=(73,)), L.Dropout(0.17), L.Dense(64, activation='relu'), L.Dropout(0.17), L.Dense(32, activation='relu'), L.Dropout(0.17)])

outcome_ = keras.Sequential([L.Dense(128, activation='relu', input_shape=(73,)), L.Dropout(0.17), L.Dense(64, activation='relu'), L.Dropout(0.17), L.Dense(32, activation='relu'), L.Dropout(0.17), L.Dense(1)])

code adapted from https://microsoft.github.io/dowhy/example_notebooks/dowhy-conditional-treatment-effects.html

keras_fit_options_1 = {"epochs": 50, "validation_split": 0.1, "callbacks": [keras.callbacks.EarlyStopping(patience=2, restore_best_weights=True)] } keras_fit_options_2 = {"epochs": 100, "validation_split": 0.1, "callbacks": [keras.callbacks.EarlyStopping(patience=2, restore_best_weights=True)] }

deepIvEst_per = DeepIV(ncomponents = 15, m = lambda z, x : treatment(L.concatenate([z,x])), h = lambda t, x : outcome_(L.concatenate([t,x])),
n_samples = 1, use_upper_bound_loss = True, n_gradient_samples = 0, optimizer= 'Adagrad',
first_stage_options = keras_fit_options_2, second_stage_options = keras_fit_options_1 )

deepIvEst_not_per = DeepIV(ncomponents = 15, m = lambda z, x : treatment(L.concatenate([z,x])), h = lambda t, x : outcome_(L.concatenate([t,x])),
n_samples = 1, use_upper_bound_loss = True, n_gradient_samples = 0, optimizer= 'Adagrad', first_stage_options = keras_fit_options_1, second_stage_options = keras_fit_options_1 )

HERE IS THE ERROR

deepIvEst_per.fit(Y=y,T=t,X=x,Z=z) deepIvEst_not_per.fit(Y=y2,T=t2,X=x2,Z=z2)`

kbattocchi commented 2 years ago

Could you please include the full stack trace of the exception?

Also, what if you just run the four lines I suggested in isolation in a new script, rather than including them as part of the script you were already using? (I'm hoping you'll see a more informative error message, not that it would fix your problem)

juandavidgutier commented 2 years ago

@kbattocchi, I ran the four lines you suggested me in isolation in a new script, and only this one show a message

import keras

2021-11-11 08:19:21.623384: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found 2021-11-11 08:19:21.623481: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. Using TensorFlow backend.