Counterfactual Samples giving invalid values of effect in dowhy gcm

py-why / dowhy

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.

MIT License

7.01k stars 921 forks source link

I am working on a heart disease dataset which has continuous variable as a treatment (thalach) and outcome categorical(target).There is another variable that the treatment affects (exang) which is categorical too. They have an inverse relation i.e. when thalach is increased, exang should come down. However, it is giving invalid values (-1) of exang (0,1) when intervened on thalach, using counterfactual_samples.

samples = gcm.counterfactual_samples(causal_model, {'thalach': lambda thalach:thalach * 1.1}, observed_data=df_risk).

Before and after intervention:

Although, I have a hack by clipping values. I would like to know if there is something built in to constrain the effect of intervention.

Hi, I am wondering, it seems the categorical variables are numerical here. Can you try converting them to strings or bools (if binary)? Otherwise, the models will interpret these as discrete (with order) or continuous.

When you convert them to categorical values (strings/bools), an issue is, however, in that we only support point-wise counterfactual estimates (in Pearl's sense). You might need to look at interventional samples instead (which also work with categorical non-root nodes). This would be:

samples = gcm.interventional_samples(causal_model, {'thalach': lambda thalach: thalach * 1.1}, observed_data=df_risk)

Note, however, that these are sampled from the interventional distribution, i.e., running it twice will give you slightly different values.

py-why / dowhy

Counterfactual Samples giving invalid values of effect in dowhy gcm #1241