y0-causal-inference / y0

❓y0 (pronounced "why not?") is for causal inference in Python
https://y0.readthedocs.io
BSD 3-Clause "New" or "Revised" License
44 stars 10 forks source link

Generate Verma constraints given an ADMG #25

Open djinnome opened 3 years ago

djinnome commented 3 years ago

Input: ADMG Output: Probability expression is independent of a set of nodes.

Note: this algorithm has already been implemented in the R package causaleffect. In #31, the causaleffect implementation was wrapped and made available through the y0 interface. However, we would like to provide our own implementation, since doing this will give us insight that can not be found by reading the R code.

There is already a data structure that represents a Verma constraint in https://github.com/y0-causal-inference/y0/blob/main/src/y0/struct.py. Further, the list of Verma constraints with several example graphs from Causal Fusion can be imported from https://github.com/y0-causal-inference/y0/blob/main/src/y0/examples.py

Both of these allow for easy test-driven development.

djinnome commented 3 years ago

From figure 1(a) of On the Testable Implications of Causal Models with Hidden Variables

Figure 1(a)

The ADMG is represented in causaleffect as:

library(causaleffect)
library(igraph)
library(ggm)
g <- graph.formula(a -+ b, b -+ c, c -+ d , b -+ d, d -+ b, simplify = FALSE)
g <- set.edge.attribute(graph = g, name = "description", index = c(4,5), value = "U")

and the Verma constraint is:

verma.constraints(g)
[[1]]
[[1]]$rhs.cfactor
[1] "Q[\\{d\\}](c,d)"

[[1]]$rhs.expr
[1] "\\sum_{u_{1},c}P(d|u_{1},c)P(c)P(u_{1})"

[[1]]$lhs.cfactor
[1] "\\sum_{b}Q[\\{b,d\\}](a,b,c,d)"

[[1]]$lhs.expr
[1] "\\sum_{b}P(d|a,b,c)P(b|a)"

[[1]]$vars
[1] "a"
cthoyt commented 3 years ago

So for the Q construct, do we need to introduce a new kind of DSL element, or would it be enough to just have another named tuple representing the contents of this kind of expression for this algorithm?

djinnome commented 3 years ago

Every Probabiistic expression has an implicit Q type associated with it. It would be really cool if each Probabilistic expression knew its Q type, and there was a predicate that could answer whether an expression was of a particular Q type.

Sincerely,

Jeremy

djinnome commented 3 years ago

Here is an example of a graph with multiple verma constraints, where the first verma constraint contains multiple variables in the $vars slot.

which means that $Q[\{e\}](d,e) _||_ b,c$, where $$Q[\{e\}](d,e) = \frac{Q[\{c,e\}](b,c,d,e)}{\sum_{e}Q[\{c,e\}](b,c,d,e) = \frac{\sum_{a}P(e|a,b,c,d)P(c|a,b)P(a)}{\sum_{a,e}P(e|a,b,c,d)P(c|a,b)P(a)}$$

(I wish github comments could include latex rendering)

Interestingly, the verma constraint for the same graph has a different denominator in CausalFusion.net!

CausalFusion.net

g <- graph.formula(a -+ b, b -+ c, c -+ d , d -+ e, a -+ c, c -+ a, a -+ e, e -+ a, simplify = FALSE)
g <- set.edge.attribute(graph = g, name = "description", index = c(5,6), value = "U")
g <- set.edge.attribute(graph = g, name = "description", index = c(7,8), value = "U")

verma.constraints(g)
[[1]]
[[1]]$rhs.cfactor
[1] "Q[\\{e\\}](d,e)"

[[1]]$rhs.expr
[1] "\\sum_{u_{2},d}P(e|u_{2},d)P(d)P(u_{2})"

[[1]]$lhs.cfactor
[1] "\\frac{Q[\\{c,e\\}](b,c,d,e)}{\\sum_{e}Q[\\{c,e\\}](b,c,d,e)}"

[[1]]$lhs.expr
[1] "\\frac{\\sum_{a}P(e|a,b,c,d)P(c|a,b)P(a)}{\\sum_{a,e}P(e|a,b,c,d)P(c|a,b)P(a)}"

[[1]]$vars
[1] "b" "c"

[[2]]
[[2]]$rhs.cfactor
[1] "Q[\\{a,e\\}](a,d,e)"

[[2]]$rhs.expr
[1] "\\sum_{u_{2},d,u_{1}}P(e|u_{2},d)P(a|u_{1},u_{2})P(u_{1})P(d)P(u_{2})"

[[2]]$lhs.cfactor
[1] "\\sum_{c}Q[\\{a,c,e\\}](a,b,c,d,e)"

[[2]]$lhs.expr
[1] "\\sum_{c}P(e|a,b,c,d)P(c|a,b)P(a)"

[[2]]$vars
[1] "b"

[[3]]
[[3]]$rhs.cfactor
[1] "Q[\\{e\\}](d,e)"

[[3]]$rhs.expr
[1] "\\sum_{u_{2},d}P(e|u_{2},d)P(d)P(u_{2})"

[[3]]$lhs.cfactor
[1] "\\sum_{a,c}Q[\\{a,c,e\\}](a,b,c,d,e)"

[[3]]$lhs.expr
[1] "\\sum_{a,c}P(e|a,b,c,d)P(c|a,b)P(a)"

[[3]]$vars
[1] "b"