py-why / dowhy

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.
https://www.pywhy.org/dowhy
MIT License
7.01k stars 921 forks source link

Counterfactual Samples giving invalid values of effect in dowhy gcm #1241

Open PMK1991 opened 4 weeks ago

PMK1991 commented 4 weeks ago

I am working on a heart disease dataset which has continuous variable as a treatment (thalach) and outcome categorical(target).There is another variable that the treatment affects (exang) which is categorical too. They have an inverse relation i.e. when thalach is increased, exang should come down. However, it is giving invalid values (-1) of exang (0,1) when intervened on thalach, using counterfactual_samples.

image

samples = gcm.counterfactual_samples(causal_model, {'thalach': lambda thalach:thalach * 1.1}, observed_data=df_risk).

Before and after intervention:

image

Although, I have a hack by clipping values. I would like to know if there is something built in to constrain the effect of intervention.

bloebp commented 4 weeks ago

Hi, I am wondering, it seems the categorical variables are numerical here. Can you try converting them to strings or bools (if binary)? Otherwise, the models will interpret these as discrete (with order) or continuous.

When you convert them to categorical values (strings/bools), an issue is, however, in that we only support point-wise counterfactual estimates (in Pearl's sense). You might need to look at interventional samples instead (which also work with categorical non-root nodes). This would be:

samples = gcm.interventional_samples(causal_model, {'thalach': lambda thalach: thalach * 1.1}, observed_data=df_risk)

Note, however, that these are sampled from the interventional distribution, i.e., running it twice will give you slightly different values.