py-why / dowhy

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.
https://www.pywhy.org/dowhy
MIT License
6.92k stars 922 forks source link

Fundamental questions about Dowhy #149

Closed Jami1141 closed 4 years ago

Jami1141 commented 4 years ago

I'm new in field of Causality and I would like to use your package for my current project. I was searching for a python package based on Judea Pearl's graphical models, the I found your package. I already worked on Uplift modeling but I want to see how my project work using other Causality models. First of all, I have a multiple treatment feature, a binary output and some features (40 features). When I was reading dowhy explanation, I found that there are two ways to model your deta: 1) model them using given graph: model = CausalModel( data=data["df"], treatment=data["treatment_name"], outcome=data["outcome_name"], graph=data["gml_graph"]) 2) without using graph: model= CausalModel( data=df, treatment=data_dict["treatment_name"], outcome=data_dict["outcome_name"], common_causes=data_dict["common_causes_names"], instruments=data_dict["instrument_names"]) model.view_model(layout="dot") from IPython.display import Image, display display(Image(filename="causal_model.png"))

My question is that If I want to use first method given graph, how to provide graph? My data includes: features (X1,X2,X3..Xn) (90 features after one-hot-encoded), Treatment(Multiple treatments(categories)), outcome(binaries)

If I use second method without giving a graph, then what should I put for data_dict["instrument_names"])? and what should be common_causes? features like X1,X2,....? I don't understand well relation between common_cause variables, confounders and instrument ones! I only have 90 features after one-hot encoded including numeric and categorical variables. I assume that df is my dataframe including features, treatment and outputs... treatment is df['Treatment'] and outcome is df['Outcome'] in my data. Is it correct? If yes, then what should I put for Instrument? is it necessary to add instrument?

Another question is that: In uplift modeling I calculate an uplift for each individual for a given binary treatment. Can I have an CATE for an individual using dowhy package for multiple treatment feature?

Thanks in advance for explanation

Tanmay-Kulkarni101 commented 4 years ago

Hey @Jami1141 have you tried out this link. It explains how to use both methods to create Causal Models.

Jami1141 commented 4 years ago

Hi @Tanmay-Kulkarni101 . I don't want to use graph directly since I have many variables about 84. That's why I am using the second method without having graph.

amit-sharma commented 4 years ago

@Jami1141 closing this assuming that you were able to run it for your dataset. In general, with many variables, it is better to provide the treatment and outcome as named variables, and there is an option to consider all remaining features automatically as confounders (missing_nodes_as_confounders).