juandavidgutier commented 3 years ago

Hello,

I am trying to estimate the effect of El Niño on incidence of leishmaniasis. I used the method "backdoor.linear_regression" with test_significance=True and confidence_intervals=True. However, when I see the value of the confidence interval [[1.02988048 2.0855936 ]], the interval does not contain the mean value of the estimate (2.8158204337251664). I am confuse about it because I hoped that the confidence interval should include the mean value of the estimate.

Can anyone help me to understand what is happening?

I appreciate the cooperation

Here my dataset data.csv

And here my code

import os, warnings, random import dowhy import econml from dowhy import CausalModel import pandas as pd import numpy as np

El Nino vs Neutral

data_nino = pd.read_csv("data") data_nino = data_nino.dropna()

data_leish_nino = data_nino.drop(['Codigo.DANE.periodo','Codigo.DANE', 'consensoENSO'], axis=1) data_leish_nino.head() data_leish_nino = data_leish_nino.astype({"TF_consenso":'bool'}, copy=False)

colombia

colombia_nino = data_leish_nino

Step 1: Modeling the causal mechanism

model_leish=CausalModel( data = colombia_nino, treatment=['TF_consenso'], outcome='incidencia100k', common_causes=['SST3.4'], effect_modifiers=['bosques'], frontdoor=['Temperature', 'Rainfall'], graph= "digraph {SST3.4->TF_consenso;SST3.4->incidencia100k;SST3.4->Temperature;SST3.4->Rainfall;TF_consenso->Temperature;TF_consenso->Rainfall;TF_consenso->incidencia100k;Temperature->incidencia100k;Rainfall->incidencia100k;bosques->incidencia100k;}" )

view model

model_leish.view_model()

Step 2: Identifying effects

identified_estimand = model_leish.identify_effect(proceed_when_unidentifiable=True) print(identified_estimand)

Step 3: Estimation of the effect

ate, significance and confidence interval

estimate_bd = model_leish.estimate_effect(identified_estimand, method_name="backdoor.linear_regression", test_significance=True, confidence_intervals=True)

print(estimate_bd)

amit-sharma commented 3 years ago

This is odd. I can try to look at this, but it may take some time.

juandavidgutier commented 3 years ago

@amit-sharma Thanks for the cooperation

jmafoster1 commented 3 years ago

I just had a similar thing with my own data. If you use the get_confidence_intervals method of the CausalEstimate class with argument method="bootstrap", that might return more sensible values. It did for me.

juandavidgutier commented 3 years ago

@jmafoster1 Great!!! Thanks for the tip.

juandavidgutier commented 3 years ago

@jmafoster1 I followed your advice but in a new dataset I found the same problem related with that the interval (0.1192 - 0.2268) does not contain the mean value of the estimate (9.689e-17). I don't know if the difficulty can be generated by the small mean value?

I am using this line of code to estimate the CI:

dml_estimate_soiltemp = model_leish.estimate_effect(identified_estimand_soiltemp, target_units = "ate",

test_significance=True,

                                #confidence_intervals=True,
                                method_name="backdoor.econml.dml.DML",
                                method_params={
                                    'init_params': {'model_y':GradientBoostingRegressor(),
                                                    'model_t': GradientBoostingRegressor(),
                                                    'featurizer':PolynomialFeatures(degree=1, include_bias=True),
                                                    'model_final':LassoCV(fit_intercept=False),
                                                    'random_state':123},
                                    'fit_params': {'inference': BootstrapInference(n_bootstrap_samples=25, n_jobs=-1),
                                                   }
                                 })

confidence interval with boostrap soiltemp

ci_Colombia_boost_soiltemp = dml_estimate_soiltemp.get_confidence_intervals(method="bootstrap", confidence_level=0.95, num_simulations=10, sample_size_fraction=0.7) print(ci_Colombia_boost_soiltemp)

jmafoster1 commented 3 years ago

I'm afraid I don't know how the confidence intervals code works, but it looks like you're using EconML as your estimator. I think they have their own methods to calculate confidence intervals. See https://microsoft.github.io/dowhy/example_notebooks/dowhy-conditional-treatment-effects.html#CATE-Object-and-Confidence-Intervals for details.

py-why / dowhy

Problem interpreting 95.0% confidence interval in backdoor.linear_regression #326

El Nino vs Neutral

colombia

Step 1: Modeling the causal mechanism

view model

Step 2: Identifying effects

Step 3: Estimation of the effect

ate, significance and confidence interval

test_significance=True,

confidence interval with boostrap soiltemp