Open adamrupe opened 2 months ago
The authors give the general form of augmentation variables as:
I don't understand though why they say script R is the intervention variable; shouldn't it be script L? In fact, in the discussion at the bottom they describe the augmentation variable $q_i^{a|x}(a; do(x))$, even though everywhere else they use $a$ for the intervention variable.
That point aside, I am trying to understand the mechanism structure described in (69). For simple cases, I think I understand why they consider direct subunit ancestors for the parents of the augmented variable. It seems like it is because of the dependencies implied by the subunit graph (which they allude to). For example, consider HCM (p) and its augmented collapsed model (q) in Figure A3. The original promoted $Q$ variables are: $Q^a$, $Q^z$, and $Q^{w|a,z}$. They augment $Q^w$, and evidently its parents / mechanism are all three of the original promoted $Q$ variables just mentioned. Mathematically, we have $\Pr(W) = \int \int \Pr(A,W,Z) dA dZ$. And, $\Pr(A,W,Z) = \Pr(W|A,Z) \Pr(A,Z)$, but from the subunit graph $A$ and $Z$ and independent, so $\Pr(A,Z) = \Pr(A) \Pr(Z)$. We therefore arrive at their given mechanism: $\Pr(W) = \int \int \Pr(W|A,Z) \Pr(A) \Pr(Z) dA dZ$, i.e. $Q^w = m(Q^{w|a,z}, Q^a, Q^z)$.
However, if we consider the HCM in the comment above with a chain structure in the subunit graph: A -> Z -> W, it appears as though it is not sufficient to consider only direct subunit ancestors. Or, equivalently, you may need to augment additional variables for the target augmented variable to valid.
For visual clarity, let's ignore the unobserved confounder U. The HCGM is: and the collapsed model is
We now want to augment $Q^w$. The direct subunit ancestor of W is Z, so from (69), I am reading the mechanism as: $Q^w = m(Q^{w|z}, Q^{z|a})$. However, this does not seem sufficient to specify the distribution $\Pr(W)$ without any information on A. If we work out the probabilities as we did above, we find that: $\Pr(W) = \int \int \Pr(A,W,Z) dA dZ$, and we can write $\Pr(A,W,Z) = \Pr(A)\Pr(W,Z | A) = \Pr(A) \Pr(Z|A) \Pr(W| A,Z)$. Now, the chain A -> Z -> W in the subunit graph implies that A and W are independent conditioned on Z. So we can drop A from the last conditional and write $\Pr(A,W,Z) = \Pr(A)\Pr(Z|A)\Pr(W|Z)$. This is now stated in terms of promoted $Q$ variables in the collapsed model. So the mechanism for $Q^w$ should be $Q^w = m(Q^a, Q^{z|a}, Q^{w|z})$.
From the final product term in (69) it looks like the assumption is that to get the augmented variable mechanism from a joint distribution, you have to integrate out marginal variables, not conditional variables. In the original promoted variables we have $Q^{z|a}$ and not $Q^z$. In the original comment above, we wondered if this means we have to first augment $Q^z$ and then use that to augment $Q^w = m(Q^{w|z}, Q^z)$. If we do this double augmentation, we get a different augmented graph, but due to the deterministic relations between augmented variables and their parents, it is equivalent to the above augmented graph: In particular, the augmented variable we really care about, $Q^w$, still has a dependence on $Q^a$, this time through $Q^z$.
But A is not a direct subunit ancestor of W in the original HCM. So I feel like there is something I'm not understanding in J.1 and Equation (69). Are they implicitly assuming no subunit chains like this for valid augmentation variables? It seems like it is still possible to create a valid augmentation variable, as just described, but the mechanism seems more complicated than what is stated in (69).
For the general case, we will need to consider the topological sorting of the subunit level graph in case we need to add several additional augmented variables (recursively) in order for the desired augmentation variable to be valid.
Let $X$ be the desired augmentation variable, and $\text{PA}_S(X)$ be the (direct) subunit parents of $X$.
From the definition of the promoted $Q$ variables, start with
$Q^{x|\text{PA}_S(x)} \rightarrow \Pr(X | \text{PA}_S(X))$
From the chain rule, the joint probability is
$\Pr(X, \text{PA}_S(X)) = \Pr(X | \text{PA}_S(X)) \Pr(\text{PA}_S(X))$
Because joint distributions are not explicitly represented in causal graphs, we have to break down the joint distribution over the subunit parents into conditional and marginal distributions.
A necessary condition to represent the joint distribution of the parents, which the HCM authors implicitly, and incorrectly, assume is always true, is the closure:
$\text{PA}_S(\text{PA}_S(X)) \subseteq \text{PA}_S(X)$
That is, the parents of the parents (i.e. the grandparents) of $X$ is contained in the set of parents of $X$.
If this closure is not satisfied, there will be a conditional distribution over (at least) one of the parents for which we do not have the distribution of the conditioning variable. We therefore, from the chain rule, we cannot represent the joint distribution of $\text{PA}_S(X)$.
To be concrete, let $P \in \text{PA}_S(X)$ be a subunit parent of $X$ and let $R$ be a (subunit) parent of $P$ that is not also a (subunit) parent of $X$: $R \in \text{PA}_S(P)$, $R \notin \text{PA}_S(X)$. Then, the promoted $Q$ variable of $P$ is conditioned on $R$: $Q^{p|r}$ which represents the distribution $\Pr(P | R)$. Since $R$ is not a (subunit) parent of $X$, it is not included in the augmentation mechanism given by the HCM authors (which is only in terms of $\text{PA}_S(X)$ ).
The simplest thing to do then is to recursively augment $P$. This removes the dependence on $R \notin \text{PA}_S(X)$ and gives the desired closure for the mechanism of $Q^x$.
The closure condition above gives the criteria for recursion. If there is a parent $P \in \text{PA}_S(X)$ that has a parent $R \notin \text{PA}_S(X)$, then augment $P$ to create the marginal promoted variable $Q^p$. Disconnect $P$ from its parents in the subunit graph. Now augment $X$ with the newly-added $Q^p$ and modified subunit graph. Note that the augmentation of $P$ may require adding yet more augmentation variables (hence the recursion).
The above does not consider augmenting conditional variables, which is done in some examples in the HCM paper. Also, there may be multiple parents $P \in \text{PA}_S(X)$ that have parents not in $\text{PA}_S(X)$ and thus need to be themselves augmented. These are different branches in the recursion, and for complicated subunit graphs these branches may interact. It may be that the order matters when augmenting multiple additional variables, and hence we would need to choose or specify some topological sorting of the subunit graph.
This recursive algorithm correctly generates the augmentation mechanism for all non-conditional augmentation variables given in the examples of the HCM paper.
Consider the following subunit graph with a diamond motif:
The promoted $Q$ variables in the collapsed model are:
$Q^c$
$Q^{b|c}$
$Q^{d|c}$
$Q^{a | b,d}$
Assume that we want the marginal subunit distribution over $A$, so we want to augment $Q^a$.
Following the discussion above, from the chain rule we want:
$\Pr(A) = \int\int \Pr(A, B, D) dB dD = \int \int \Pr(A| B,D) \Pr(B,D) dB dD$
since we have $\Pr(A | B,D)$. However, the recursive algorithm described above would give the marginals $Q^b \sim \Pr(B)$ and $Q^d \sim \Pr(D)$, and not the required joint distribution over $\Pr(B, D)$. Note that $B$ and $D$ are not independent, so we cannot just multiply the marginals to get the joint.
Because the creation of joint distributions is the crucial step in identifying which variables belong in an augmentation mechanism, it seems to make the most sense to explicitly represent joint distributions as some kind of "intermediary" variables in the collapsed model. We therefore separate augmentation mechanisms into two distinct (deterministic) components. Originally, augmentation mechanisms are given as e.g. $Q^x = f(\cdot)$, which represents the combination of creating a joint variable and then marginalizing everything except $X$. This can be equivalently given as the composition of two functions $f = h \circ g$, where $g$ creates a joint distribution and $h$ marginalizes everything except $X$.
From the conditional independence of $B$ and $D$ given $C$ implied by the subunit graph shown above, we have that $\Pr(B, C, D) = \Pr(B|C) \Pr(D|C) \Pr(C)$, and so we create the intermediary joint variable $Q^{b,c,d} = g(Q^{b|c}, Q^{d|c}, Q^c)$. We can then marginalize out $C$ to get another intermediary joint variable $Q^{b,d} = h(Q^{b,c,d})$. From the chain, we can create a final intermediary variable as $Q^{a,b,d} = g(Q^{a|b,d}, Q^{b,d})$, and then arrive at our desired augmentation variable $Q^a = h(Q^{a,b,d})$. This is shown graphically as follows:
This formalism with intermediary joint variables is a more intuitive approach to creating augmentation mechanisms, and has a lot of flexibility following the chain rule. For example, a straightforward thing to do is to simply consider the joint distribution over all subunit variables. From the chain rule and conditional independencies of the subunit graph, we have the following for the diamond motif:
This appears to be the case for all examples we've looked at, but need to investigate it more rigorously. It would have to follow from the DAG structure of the subunit graph and the resulting conditional (in)dependencies.
Because the intermediary joint variables (diamond nodes) are deterministic functions of their parents, we should be able to just remove them and connect their parents to their children. But we will have to think more about how to implement this approach into Algorithm 2 (augmenting a collapsed model).
To close, let's revisit the example in the first comment of this Issue: a subgraph chain A -> Z -> W and we want to augment W. Both cases, with the single augmented $Q^w$ and with the intermediate / "recursive" augmented $Q^z$ follow from different applications of the chain rule in the new formalism. As with the diamond motif example, we can equivalently consider the full joint distribution over the subunits, $\Pr(A, W, Z)$, or the two joint distributions $(\Pr(A,Z)$ and $\Pr(Z, W)$. The two different graphs in the new formalism are given as:
for the full joint distribution over subunits,
and
for the two separate joint distributions. Erasing the intermediary joint distributions (diamond nodes) recovers the two augmented graphs given above in the first comment.
Now some variables that are not in the query need to be augmented because they are needed to generate the Q-variables in the query
For example, in this graph,
We want to augment $Q_w$:
But we Have $Q{w|z}$ and $Q{z|a}$
need: $Q_{w|z}$ and $Q_z$
so then augment $Qz$ from $Q{z|a}$ and $Q_a$?
Originally posted by @djinnome in https://github.com/y0-causal-inference/y0/issues/239#issuecomment-2347435292