Use same causal graph in different version of Dowhy, and the identification of backdoor is different

lx2m17 commented 2 years ago

Hi, I use same causal graph in different version of Dowhy (0.7.1 and 0.6)，but the effect identification is different . This is very confusing, please help.

Version 0.7.1

Estimand type: nonparametric-ate

Estimand : 1

Estimand name: backdoor Estimand expression: d
────────(Expectation(ord_num|mt_charge_fee_30days,high_confidence_age,income_l d[isᵢₘₚ]

evel))

Estimand assumption 1, Unconfoundedness: If U→{is_imp} and U→ord_num then P(ord_num|is_imp,charge_fee_30days,high_confidence_age,income_level,U) = P(ord_num|is_imp,charge_fee_30days,high_confidence_age,income_level)

Estimand : 2

Estimand name: iv No such variable(s) found!

Estimand : 3

Estimand name: frontdoor No such variable(s) found!

Version 0.6

Estimand type: nonparametric-ate

Estimand : 1

Estimand name: backdoor Estimand expression: d
────────(Expectation(ord_num|career,is_bind_user,marriage,count_nofee_order d[isᵢₘₚ]

_num,alphau3,charge_fee_30days,select_hesitate_score,high_confidence_gender

,is_7_new_coup,income_level,ord_amt_90days,high_confidence_age))

Estimand assumption 1, Unconfoundedness: If U→{is_imp} and U→ord_num then P(ord_num|is_imp,career,is_bind_user,marriage,count_nofee_order_num,alphau3,charge_fee_30days,select_hesitate_score,high_confidence_gender,is_7_new_coup,income_level,ord_amt_90days,high_confidence_age,U) = P(ord_num|is_imp,career,is_bind_user,marriage,count_nofee_order_num,alphau3,mt_charge_fee_30days,select_hesitate_score,high_confidence_gender,is_7_new_coup,income_level),high_confidence_age)

Estimand : 2

Estimand name: iv No such variable found!

Estimand : 3

Estimand name: frontdoor No such variable found!

causal_model

amit-sharma commented 2 years ago

Both versions are correct. There are multiple valid backdoor sets possible. v0.7 updated a default which tries to show a minimal backdoor set. If you'd like to see all valid backdoor sets, try method_name="exhaustive" as a parameter of identify_effect in v0.7. Then you can do identified_estimand.__str__(show_all_backdoor_sets=True) or use identified_estimand.estimands.

More generally, backdoor condition can be satisfied by multiple subsets of variables. So which backdoor subset to use is a statistical question that depends on the bias-variance tradeoff. This is a good read: https://ftp.cs.ucla.edu/pub/stat_ser/r493.pdf

lx2m17 commented 2 years ago

Thanks for the reply, It really helps!

py-why / dowhy