Open adamrupe opened 1 week ago
I think figure 6b shows you how to handle unobserved subunit variables that connects to a single subunit variable, and figure 6d shows you how to handle unobserved subunit variables that connect to a single unit variable: you just marginalize them out.
But this raises a new question: how do you handle when an unobserved subunit variable connects to two or more subunit or unit variables? This is the case of unobserved confounding, and this may require a bit more careful thinking.
I should have clarified, but yes I meant specifically the case of unobserved confounding due to an unobserved subunit variable.
Let's break down the latent subunit confounders into two categories:
For latent subunit confounders without observed parents, we can just follow algorithm 1: For each subunit endogenous variable $v\in \mathcal{S}$
For latent subunit confounders with parents, we can still follow algorithm 1, but it makes a difference whether the parents are subunit variables or unit variables.
If the parents are subunit variables, they are no longer parents in the latent $Q^v$ variable.
The parents are unit variables, they are disconnected from the subunit variable $v$ and connected to the unit variable $Q^{v|pa_{\mathscr V}}$.
Does it matter if you have a chain of latent confounding variables? To be specific, if a subunit variable has unobserved subunit parents, its promoted Q variable is unobserved. Does it matter that this promoted Q variable is not connected to the promoted Q variable of its subunit parent?
I think the conditions in the Algorithm 1 pseudo code in the paper might sufficiently cover this. Now that I'm splitting up creating HCGMs and then collapsing in the code for Algorithm 1, I understand this better. Edges in the collapsed model (specifically undirected edges) are all at the unit level, and the Algorithm 1 pseudo code outlines whether the promoted Q variables are observed or not. Then the undirected edges are created based on unobserved unit variables (including Q variables).
How do we want to handle unobserved subunit variables? They are discussed in Algorithm 1, but none of the examples have unobserved subunits. The current implementation of Algorithm 1, collapse_HCM, raises a ValueError saying that are not currently supported. If we allow unobserved subunit variables in HCMs that input into Algorithm 1 for collapse, we will need to decide how to handle their edges (i.e. if they connect only to other subunit variables or if they also connect to unit variables).