y0-causal-inference / y0

❓y0 (pronounced "why not?") is for causal inference in Python
https://y0.readthedocs.io
BSD 3-Clause "New" or "Revised" License
47 stars 10 forks source link

Augmented variable mechanism #248

Open adamrupe opened 2 months ago

adamrupe commented 2 months ago

Furthermore, Algorithm 2 must take as input the causal query, composed of the intervention and the outcome (and potentially any conditioned variables) so that we know which Q variables must be augmented.

Now some variables that are not in the query need to be augmented because they are needed to generate the Q-variables in the query

For example, in this graph, A4_p_mod

We want to augment $Q_w$:

But we Have $Q{w|z}$ and $Q{z|a}$

need: $Q_{w|z}$ and $Q_z$

so then augment $Qz$ from $Q{z|a}$ and $Q_a$?

Originally posted by @djinnome in https://github.com/y0-causal-inference/y0/issues/239#issuecomment-2347435292

adamrupe commented 2 months ago

The authors give the general form of augmentation variables as:

HCM_J 1

I don't understand though why they say script R is the intervention variable; shouldn't it be script L? In fact, in the discussion at the bottom they describe the augmentation variable $q_i^{a|x}(a; do(x))$, even though everywhere else they use $a$ for the intervention variable.

That point aside, I am trying to understand the mechanism structure described in (69). For simple cases, I think I understand why they consider direct subunit ancestors for the parents of the augmented variable. It seems like it is because of the dependencies implied by the subunit graph (which they allude to). For example, consider HCM (p) and its augmented collapsed model (q) in Figure A3. A3_p_q The original promoted $Q$ variables are: $Q^a$, $Q^z$, and $Q^{w|a,z}$. They augment $Q^w$, and evidently its parents / mechanism are all three of the original promoted $Q$ variables just mentioned. Mathematically, we have $\Pr(W) = \int \int \Pr(A,W,Z) dA dZ$. And, $\Pr(A,W,Z) = \Pr(W|A,Z) \Pr(A,Z)$, but from the subunit graph $A$ and $Z$ and independent, so $\Pr(A,Z) = \Pr(A) \Pr(Z)$. We therefore arrive at their given mechanism: $\Pr(W) = \int \int \Pr(W|A,Z) \Pr(A) \Pr(Z) dA dZ$, i.e. $Q^w = m(Q^{w|a,z}, Q^a, Q^z)$.

However, if we consider the HCM in the comment above with a chain structure in the subunit graph: A -> Z -> W, it appears as though it is not sufficient to consider only direct subunit ancestors. Or, equivalently, you may need to augment additional variables for the target augmented variable to valid.

For visual clarity, let's ignore the unobserved confounder U. The HCGM is: chain_HCGM and the collapsed model is chain_col

We now want to augment $Q^w$. The direct subunit ancestor of W is Z, so from (69), I am reading the mechanism as: $Q^w = m(Q^{w|z}, Q^{z|a})$. However, this does not seem sufficient to specify the distribution $\Pr(W)$ without any information on A. If we work out the probabilities as we did above, we find that: $\Pr(W) = \int \int \Pr(A,W,Z) dA dZ$, and we can write $\Pr(A,W,Z) = \Pr(A)\Pr(W,Z | A) = \Pr(A) \Pr(Z|A) \Pr(W| A,Z)$. Now, the chain A -> Z -> W in the subunit graph implies that A and W are independent conditioned on Z. So we can drop A from the last conditional and write $\Pr(A,W,Z) = \Pr(A)\Pr(Z|A)\Pr(W|Z)$. This is now stated in terms of promoted $Q$ variables in the collapsed model. So the mechanism for $Q^w$ should be $Q^w = m(Q^a, Q^{z|a}, Q^{w|z})$. chain_augmented1

From the final product term in (69) it looks like the assumption is that to get the augmented variable mechanism from a joint distribution, you have to integrate out marginal variables, not conditional variables. In the original promoted variables we have $Q^{z|a}$ and not $Q^z$. In the original comment above, we wondered if this means we have to first augment $Q^z$ and then use that to augment $Q^w = m(Q^{w|z}, Q^z)$. If we do this double augmentation, we get a different augmented graph, but due to the deterministic relations between augmented variables and their parents, it is equivalent to the above augmented graph: chain_augmented2 In particular, the augmented variable we really care about, $Q^w$, still has a dependence on $Q^a$, this time through $Q^z$.

But A is not a direct subunit ancestor of W in the original HCM. So I feel like there is something I'm not understanding in J.1 and Equation (69). Are they implicitly assuming no subunit chains like this for valid augmentation variables? It seems like it is still possible to create a valid augmentation variable, as just described, but the mechanism seems more complicated than what is stated in (69).

adamrupe commented 2 months ago

For the general case, we will need to consider the topological sorting of the subunit level graph in case we need to add several additional augmented variables (recursively) in order for the desired augmentation variable to be valid.

adamrupe commented 1 month ago

Let $X$ be the desired augmentation variable, and $\text{PA}_S(X)$ be the (direct) subunit parents of $X$.
From the definition of the promoted $Q$ variables, start with

$Q^{x|\text{PA}_S(x)} \rightarrow \Pr(X | \text{PA}_S(X))$

From the chain rule, the joint probability is

$\Pr(X, \text{PA}_S(X)) = \Pr(X | \text{PA}_S(X)) \Pr(\text{PA}_S(X))$

Because joint distributions are not explicitly represented in causal graphs, we have to break down the joint distribution over the subunit parents into conditional and marginal distributions.

A necessary condition to represent the joint distribution of the parents, which the HCM authors implicitly, and incorrectly, assume is always true, is the closure:

$\text{PA}_S(\text{PA}_S(X)) \subseteq \text{PA}_S(X)$

That is, the parents of the parents (i.e. the grandparents) of $X$ is contained in the set of parents of $X$.
If this closure is not satisfied, there will be a conditional distribution over (at least) one of the parents for which we do not have the distribution of the conditioning variable. We therefore, from the chain rule, we cannot represent the joint distribution of $\text{PA}_S(X)$.

To be concrete, let $P \in \text{PA}_S(X)$ be a subunit parent of $X$ and let $R$ be a (subunit) parent of $P$ that is not also a (subunit) parent of $X$: $R \in \text{PA}_S(P)$, $R \notin \text{PA}_S(X)$. Then, the promoted $Q$ variable of $P$ is conditioned on $R$: $Q^{p|r}$ which represents the distribution $\Pr(P | R)$. Since $R$ is not a (subunit) parent of $X$, it is not included in the augmentation mechanism given by the HCM authors (which is only in terms of $\text{PA}_S(X)$ ).

The simplest thing to do then is to recursively augment $P$. This removes the dependence on $R \notin \text{PA}_S(X)$ and gives the desired closure for the mechanism of $Q^x$.
The closure condition above gives the criteria for recursion. If there is a parent $P \in \text{PA}_S(X)$ that has a parent $R \notin \text{PA}_S(X)$, then augment $P$ to create the marginal promoted variable $Q^p$. Disconnect $P$ from its parents in the subunit graph. Now augment $X$ with the newly-added $Q^p$ and modified subunit graph. Note that the augmentation of $P$ may require adding yet more augmentation variables (hence the recursion).

The above does not consider augmenting conditional variables, which is done in some examples in the HCM paper. Also, there may be multiple parents $P \in \text{PA}_S(X)$ that have parents not in $\text{PA}_S(X)$ and thus need to be themselves augmented. These are different branches in the recursion, and for complicated subunit graphs these branches may interact. It may be that the order matters when augmenting multiple additional variables, and hence we would need to choose or specify some topological sorting of the subunit graph.

This recursive algorithm correctly generates the augmentation mechanism for all non-conditional augmentation variables given in the examples of the HCM paper.

adamrupe commented 1 month ago

The recursive algorithm described above fails for more complicated subunit graphs with dependencies between intermediate variables that are augmented to get to the desired augmented variable.

Consider the following subunit graph with a diamond motif: diamond The promoted $Q$ variables in the collapsed model are: $Q^c$ $Q^{b|c}$ $Q^{d|c}$ $Q^{a | b,d}$ Assume that we want the marginal subunit distribution over $A$, so we want to augment $Q^a$.
Following the discussion above, from the chain rule we want: $\Pr(A) = \int\int \Pr(A, B, D) dB dD = \int \int \Pr(A| B,D) \Pr(B,D) dB dD$ since we have $\Pr(A | B,D)$. However, the recursive algorithm described above would give the marginals $Q^b \sim \Pr(B)$ and $Q^d \sim \Pr(D)$, and not the required joint distribution over $\Pr(B, D)$. Note that $B$ and $D$ are not independent, so we cannot just multiply the marginals to get the joint.

New formalism for augmentation mechanisms

Because the creation of joint distributions is the crucial step in identifying which variables belong in an augmentation mechanism, it seems to make the most sense to explicitly represent joint distributions as some kind of "intermediary" variables in the collapsed model. We therefore separate augmentation mechanisms into two distinct (deterministic) components. Originally, augmentation mechanisms are given as e.g. $Q^x = f(\cdot)$, which represents the combination of creating a joint variable and then marginalizing everything except $X$. This can be equivalently given as the composition of two functions $f = h \circ g$, where $g$ creates a joint distribution and $h$ marginalizes everything except $X$.

Diamond subgraph motif example

From the conditional independence of $B$ and $D$ given $C$ implied by the subunit graph shown above, we have that $\Pr(B, C, D) = \Pr(B|C) \Pr(D|C) \Pr(C)$, and so we create the intermediary joint variable $Q^{b,c,d} = g(Q^{b|c}, Q^{d|c}, Q^c)$. We can then marginalize out $C$ to get another intermediary joint variable $Q^{b,d} = h(Q^{b,c,d})$. From the chain, we can create a final intermediary variable as $Q^{a,b,d} = g(Q^{a|b,d}, Q^{b,d})$, and then arrive at our desired augmentation variable $Q^a = h(Q^{a,b,d})$. This is shown graphically as follows:

diamond_2joints

This formalism with intermediary joint variables is a more intuitive approach to creating augmentation mechanisms, and has a lot of flexibility following the chain rule. For example, a straightforward thing to do is to simply consider the joint distribution over all subunit variables. From the chain rule and conditional independencies of the subunit graph, we have the following for the diamond motif: diamond_1joint

open question: is the joint distribution over all subunit variables (that are not fully independent of the desired augmentation variable $X$) given by the promoted $Q$ variables of all of the subunit ancestors (including non-direct) of $X$?

This appears to be the case for all examples we've looked at, but need to investigate it more rigorously. It would have to follow from the DAG structure of the subunit graph and the resulting conditional (in)dependencies.

Because the intermediary joint variables (diamond nodes) are deterministic functions of their parents, we should be able to just remove them and connect their parents to their children. But we will have to think more about how to implement this approach into Algorithm 2 (augmenting a collapsed model).

To close, let's revisit the example in the first comment of this Issue: a subgraph chain A -> Z -> W and we want to augment W. Both cases, with the single augmented $Q^w$ and with the intermediate / "recursive" augmented $Q^z$ follow from different applications of the chain rule in the new formalism. As with the diamond motif example, we can equivalently consider the full joint distribution over the subunits, $\Pr(A, W, Z)$, or the two joint distributions $(\Pr(A,Z)$ and $\Pr(Z, W)$. The two different graphs in the new formalism are given as:

chain_1joint for the full joint distribution over subunits,
and

chain_2joints for the two separate joint distributions. Erasing the intermediary joint distributions (diamond nodes) recovers the two augmented graphs given above in the first comment.