py-why / dowhy

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.
https://www.pywhy.org/dowhy
MIT License
6.91k stars 921 forks source link

returning array([0.]) for p value of test_significance #1019

Open ianisadreamer opened 11 months ago

ianisadreamer commented 11 months ago

Hi,

I used estimate.test_stat_significance() to see if the effect is significantly not close to 0, and it returns non-meaningful result as belows:

{'p_value': array([0.])}

What does it mean? is there anything wrong?

amit-sharma commented 11 months ago

Can you post a minimum working example so we can reproduce the bug?

ianisadreamer commented 10 months ago

Thank you! Do you think this is enough?

`from sklearn import preprocessing

create instance of label encoder

lab = preprocessing.LabelEncoder()

perform label encoding

tmp_clean['is_matched'] = lab.fit_transform(tmp_clean['is_matched']) tmp_clean['work_type'] = lab.fit_transform(tmp_clean['work_type']) tmp_clean['report_violation_type']= lab.fit_transform(tmp_clean['report_violation_type']) tmp_clean['market_reporting'] = lab.fit_transform(tmp_clean['market_reporting']) tmp_clean['detection_type'] = lab.fit_transform(tmp_clean['detection_type']) tmp_clean['big_mac_segments'] = lab.fit_transform(tmp_clean['big_mac_segments']) tmp_clean['is_admitted'] = lab.fit_transform(tmp_clean['is_admitted'])

from dowhy import CausalModel

model=CausalModel( data = tmp_clean, treatment='is_matched', outcome='decision_handle_time', common_causes= ['work_type', 'market_reporting', 'report_violation_type', 'detection_type', 'big_mac_segments', 'is_admitted'] ) model.view_model()

estimands = model.identify_effect() print(estimands)

estimate = model.estimate_effect(estimands,method_name="backdoor.linear_regression", test_significance=True) print(estimate)

estimate.test_stat_significance()`

ianisadreamer commented 10 months ago

@amit-sharma It constantly returns {'p_value': array([0.])} to me. I actually found when I manually built another causal graph and send it to the CausalModel(), it will return the normal P_value. I'm wondering if it's related to the common_causes used in the CausalModel or the graph itself.

amit-sharma commented 10 months ago

I see, it may be due to the dataset too. To reproduce the error, I will need the data too. Can you share the tmp_clean dataframe? If that is not possible, please share a simulated dataset that reproduces the problem.