y0-causal-inference / y0

❓y0 (pronounced "why not?") is for causal inference in Python
https://y0.readthedocs.io
BSD 3-Clause "New" or "Revised" License
44 stars 9 forks source link

Unobserved subunit confounder variables #241

Open adamrupe opened 1 week ago

adamrupe commented 1 week ago

How do we want to handle unobserved subunit variables? They are discussed in Algorithm 1, but none of the examples have unobserved subunits. The current implementation of Algorithm 1, collapse_HCM, raises a ValueError saying that are not currently supported. If we allow unobserved subunit variables in HCMs that input into Algorithm 1 for collapse, we will need to decide how to handle their edges (i.e. if they connect only to other subunit variables or if they also connect to unit variables).

djinnome commented 1 week ago

I think figure 6b shows you how to handle unobserved subunit variables that connects to a single subunit variable, and figure 6d shows you how to handle unobserved subunit variables that connect to a single unit variable: you just marginalize them out.

Screenshot 2024-09-13 at 4 36 27 PM

But this raises a new question: how do you handle when an unobserved subunit variable connects to two or more subunit or unit variables? This is the case of unobserved confounding, and this may require a bit more careful thinking.

adamrupe commented 4 days ago

I should have clarified, but yes I meant specifically the case of unobserved confounding due to an unobserved subunit variable.

djinnome commented 4 days ago

Let's break down the latent subunit confounders into two categories:

  1. Those without observed parents
  2. Those with observed parents.

For latent subunit confounders without observed parents, we can just follow algorithm 1: For each subunit endogenous variable $v\in \mathcal{S}$

  1. Create a unit endogenous variable $Q^v$
  2. Mark $Q^v$ as hidden
  3. for each direct unit descendant $w \in \text{dd}_v(\mathcal{S})$ do connect $Q^v$ to $X^w$ end for
  4. erase the subunit variable $X^v$

For latent subunit confounders with parents, we can still follow algorithm 1, but it makes a difference whether the parents are subunit variables or unit variables.

If the parents are subunit variables, they are no longer parents in the latent $Q^v$ variable.

The parents are unit variables, they are disconnected from the subunit variable $v$ and connected to the unit variable $Q^{v|pa_{\mathscr V}}$.

adamrupe commented 3 days ago

Does it matter if you have a chain of latent confounding variables? To be specific, if a subunit variable has unobserved subunit parents, its promoted Q variable is unobserved. Does it matter that this promoted Q variable is not connected to the promoted Q variable of its subunit parent?

I think the conditions in the Algorithm 1 pseudo code in the paper might sufficiently cover this. Now that I'm splitting up creating HCGMs and then collapsing in the code for Algorithm 1, I understand this better. Edges in the collapsed model (specifically undirected edges) are all at the unit level, and the Algorithm 1 pseudo code outlines whether the promoted Q variables are observed or not. Then the undirected edges are created based on unobserved unit variables (including Q variables).