Augmentation Mechanisms

Overview

First, we argue that the augmentation mechanism described in Appendix J.1, as far as we understand it, is not sufficient for more complicated subunit graphs.
We then introduce a more general formalism for augmentation mechanisms. This includes the use of intermediary variables representing joint distributions over subunit variables, as well as decomposing the mechanism function into two distinct pieces.
Open question: from the new formalism, it appears that augmentation mechanisms should generally include ALL subunit ancestors, not just subunit direct ancestors. Is this correct?

Despite confusion about Appendix J.1 (see issue #255), it seems clear that the general form of augmentation mechanisms stated by the authors include the ($Q$ variables of) direct subunit ancestors of the subunit variable being augmented. From our understanding of augmentation, this is not valid for arbitrary subunit graphs.

Dependence on non-direct subunit ancestors

For simplicity, consider a subunit graph with a chain structure, A -> Z -> W, and say that we want to augment $Q^w$ (the variable labels here are arbitrary). The HCGM is: chain_HCGM and the collapsed model is: chain_col

We now want to augment $Q^w$. The direct subunit ancestor of W is Z, so from (69), I am reading the mechanism as: $Q^w = f(Q^{w|z}, Q^{z|a})$. However, this does not seem sufficient to specify the distribution $\Pr(W)$ without any information on A. If we work out the probabilities we find that: $\Pr(W) = \int \int \Pr(A,W,Z) dA dZ$, and we can write $\Pr(A,W,Z) = \Pr(A)\Pr(W,Z | A) = \Pr(A) \Pr(Z|A) \Pr(W| A,Z)$. Now, the chain A -> Z -> W in the subunit graph implies that A and W are independent conditioned on Z. So we can drop A from the last conditional and write $\Pr(A,W,Z) = \Pr(A)\Pr(Z|A)\Pr(W|Z)$. This is now stated in terms of promoted $Q$ variables in the collapsed model. So the mechanism for $Q^w$ should be $Q^w = f(Q^a, Q^{z|a}, Q^{w|z})$. chain_augmented1

Note that we could also chain together two augmentation variables (as done in Figure A3 (n) in the Appendix). In this case, we first augment $Q^z$ with parents / mechanism $Q^a$ and $Q^{z|a}$. Then we can augment $Q^w$ as $Q^w = f(Q^{w|z}, Q^z)$. If we do this double augmentation, we get a different augmented graph, but due to the deterministic relations between augmented variables and their parents, it is equivalent to the above augmented graph: chain_augmented2 In particular, the augmented variable we really care about, $Q^w$, still has a dependence on $Q^a$, this time through $Q^z$.

In both cases, we are left with a dependence on $A$, despite $A$ not being a subunit direct ancestor of W in the original HCM.

New formalism for augmentation mechanisms

Because the creation of joint distributions is the crucial step in identifying which variables belong in an augmentation mechanism, it seems to make the most sense to explicitly represent joint distributions as some kind of "intermediary" variables in the collapsed model. We therefore separate augmentation mechanisms into two distinct (deterministic) components. Originally, augmentation mechanisms are given as e.g. $Q^x = f(\cdot)$, which represents the combination of creating a joint variable and then marginalizing everything except $X$. This can be equivalently given as the composition of two functions $f = h \circ g$, where $g$ creates a joint distribution and $h$ marginalizes everything except $X$.

Consider the following subunit graph with a diamond motif: diamond The promoted $Q$ variables in the collapsed model are: $Q^c$ $Q^{b|c}$ $Q^{d|c}$ $Q^{a | b,d}$ Assume that we want the marginal subunit distribution over $A$, so we want to augment $Q^a$.
From the chain rule we want: $\Pr(A) = \int\int \Pr(A, B, D) dB dD = \int \int \Pr(A| B,D) \Pr(B,D) dB dD$ since we have $\Pr(A | B,D)$. This then requires the joint distribution $\Pr(B, D)$, which cannot be decomposed since $B$ and $D$ are not independent due to $C$. However, from the conditional independence of $B$ and $D$ given $C$, we have that $\Pr(B, C, D) = \Pr(B|C) \Pr(D|C) \Pr(C)$. And so we create the intermediary joint variable $Q^{b,c,d} = g(Q^{b|c}, Q^{d|c}, Q^c)$. We can then marginalize out $C$ to get another intermediary joint variable $Q^{b,d} = h(Q^{b,c,d})$. From the chain rule, we can create a final intermediary variable as $Q^{a,b,d} = g(Q^{a|b,d}, Q^{b,d})$, and then arrive at our desired augmentation variable $Q^a = h(Q^{a,b,d})$. This is shown graphically as follows:

Grey circle nodes are the original promoted $Q$ variables from collapsing the HCM
Diamond nodes are intermediary joint variables
Square nodes are marginal augmentation variables
Blue nodes are determined from their parents according to the product function $g$
Red nodes are determined from their parent by the marginalization function $h$

diamond_2joints

This formalism with intermediary joint variables is a more intuitive approach to creating augmentation mechanisms, and has a lot of flexibility following the chain rule. For example, a straightforward thing to do is to simply consider the joint distribution over all subunit variables. From the chain rule and conditional independencies of the subunit graph, we have the following for the diamond motif: diamond_1joint

open question: is the joint distribution over all subunit variables (that are not fully independent of the desired augmentation variable $X$) given by the promoted $Q$ variables of all of the subunit ancestors (including non-direct) of $X$?

This appears to be the case for all examples we've looked at, but need to investigate it more rigorously. It would have to follow from the DAG structure of the subunit graph and the resulting conditional (in)dependencies.

Because deterministic variables (diamond and square nodes) are deterministic functions of their parents, we can just remove them as needed and connect their parents to their children.

To close, let's revisit the first example above: a subgraph chain A -> Z -> W and we want to augment W. Both cases, with the single augmented $Q^w$ and with the intermediate augmented $Q^z$ follow from different applications of the chain rule in the new formalism. As with the diamond motif example, we can equivalently consider the full joint distribution over the subunits, $\Pr(A, W, Z)$, or the two joint distributions $\Pr(A,Z)$ and $\Pr(Z, W)$. The two different graphs in the new formalism are given as:

chain_1joint for the full joint distribution over subunits,
and

chain_2joints for the two separate joint distributions. Erasing the intermediary joint distributions (diamond nodes) recovers the two augmented graphs given above.

y0-causal-inference / y0