mschauer / CausalInference.jl

Causal inference, graphical models and structure learning in Julia
https://mschauer.github.io/CausalInference.jl/latest/
Other
189 stars 24 forks source link

WIP: Algorithm for counting the number of consistent extensions (size of an MEC given a CPDAG) #113

Open mwien opened 1 year ago

mwien commented 1 year ago

Next step I think is adding the following methods, most of which are already implemented, but I'm still thinking about the proper way to integrate them (and need some time to polish them):

First some explanation, what is the difference between PDAGs and CPDAGs (how I use the terms) and why it is relevant: A CPDAG represents a Markov equivalence class (MEC) and has the same skeleton and v-structures as the DAGs in the MEC plus additional orient edges that follow from the v-structures by the Meek rules. Hence, for MEC {a -> b -> c, a <- b <- c, a <- b -> c} the CPDAG would be a - b - c.

In contrast a <- b - c is not a CPDAG (even if it is fully oriented by the Meek rules) because it does not represent (via the def. of consistent extensions) a full Markov equivalence class, but only the two DAGs {a <- b -> c and a <- b <- c} which form a subclass of the one above. (Such graphs (which are completed under the Meek rules but non necessarily CPDAGs) are also called MPDAGs.)

The first method for counting the number of consistent extensions only works for CPDAGs (and some generalizations), but not for PDAGs or MPDAGs in general. In particular, applying it to a graph which is not a CPDAG could simply give the wrong result (e.g., there are x consistent extensions but the algorithm will spit out the number y). The reason for this is that CPDAGs have a number of nice properties that the algorithm relies on (e.g. that a - b - c implies that there cannot be a -> c, which we use in GES; this holds neither for PDAGs nor for MPDAGs).

This is all fine for the GES algorithm, which is guaranteed to output a CPDAG, but the PC algorithm might not. The PC algorithm could (when mistakes happen during independence testing) output graphs which are not CPDAGs. Actually, this happens extremely often (from my experience) and the graph obtained by learning the skeleton and the v-structures moreover often has no consistent extension (i.e., Dor-Tarsi would output an error for such an input in the current implementation). It has to be said that applying the Meek rules to a PDAG, which has no consistent extension, might make the graph "extendable" again: E.g. after skeleton and v-structure phase one might have (due to errors in independence testing) a -> b <- c and g -> f <- h and b - d - e - f. Then, applying the Meek rules will fully orient the chain b - d - e - f (in which way depends on the implementation and vertex numbering). The graph is then a DAG and thus trivially has itself as consistent extension (what happened here is that the Meek rules introduced a new v-structure; this can happen if the given PDAG is not extendable).

The most unopinionated way of including the methods is to just give proper warning about the output of the PC algorithm. One could also include a check into the functions that expect a CPDAG, i.e., they first test whether the graph is a CPDAG/fulfills the necessary conditions for the algorithm to output the correct result (but that would of course make the methods slower).

Generally, I think that something along those lines would be fine (and in any case, I'm probably going to polish the algorithms itself for a while before we have to settle for a solution to this), but I wanted to draw attention to this problem.

There are a lot of practical methods, which are build on top of observational causal discovery, which to a higher or lesser degree demand the output graph to be a CPDAG and it's not great that the sample PC algorithm does not provide one. I guess the reason why people still often prefer standard PC is that every orientation in the output graph comes (more or less) directly from conditional independencies which via the Markov property have a rather clear causal interpretation (though even here applying the Meek rules might introduce (arbitrary) new v-structures in case of errors). If one were to e.g. try to augment the output of PC to be a CPDAG, the causal interpretation is certainly less clear.

mschauer commented 1 year ago

Ah, that's good timing, you exactly can use my Dor and Tasi algorithm for the is_cpdag check. I should make a version which doesn't throw an error in that case. For the planning, I am going to look at Markov chains on equivalence classes of causal graphs (or on CPDAGs) next building up from GES, similar with https://projecteuclid.org/journals/annals-of-statistics/volume-41/issue-4/Reversible-MCMC-on-Markov-equivalence-classes-of-sparse-directed-acyclic/10.1214/13-AOS1125.full but pushing it a bit forward. This is needed to sample posteriors over CPDAGs for example using the Metropolis-Hastings algorithm. Is this interesting to you? For this I'll need to count the number of CPDAGs which can be reached by transition move to assure a reversible Markov chain.

(Link is down: Here the arxiv version https://arxiv.org/abs/1209.5860)

mwien commented 1 year ago

Ah, that's good timing, you exactly can use my Dor and Tasi algorithm for the is_cpdag check

Yep, that fits well, I'm very happy with the general direction (also adding GES etc). Btw you can also check whether a graph is a CPDAG in linear-time (which you wouldn't get with Dor-Tarsi) as it is possible to construct a consistent extension of a CPDAG in linear-time (if the input graph is not a CPDAG this algorithm will just output a garbage DAG and DAG-to-CPDAG will find some other graph not equal to the input one, so the check still works). Maybe I will add this as well...

Regarding sanity checks: Testing whether the output of sample PC is a CPDAG sounds fine and is a safe choice. It will likely (at least for moderately large graphs) almost always find that the graph is not a CPDAG. Well, let's just see how this develops...

Generally, want to keep expectations about the timeline of this PR low, have to finish my PhD thesis 😅

For the planning, I am going to look at Markov chains on equivalence classes of causal graphs (or on CPDAGs) next building up from GES

That's very interesting! Now that you mention sampling, I totally forgot to add above that I also have code to uniformly sample a member of an MEC efficiently (it is based on the counting method and is an exact sampler) 😅

My prior research was kind of orthogonal to (counting or) sampling CPDAGs (focussing on the combinatorial and algorithmic aspects of a single MEC not the space of MECs). At some point I skimmed the paper https://projecteuclid.org/journals/annals-of-statistics/volume-41/issue-4/Reversible-MCMC-on-Markov-equivalence-classes-of-sparse-directed-acyclic/10.1214/13-AOS1125.full and I also thought about some related problems, but never really got far. Counting the number of "adjacent" CPDAGs sounds fun, the current approach is to test every possible operation and check whether it gives a consistent extension/new CPDAG? Would be interested in contributing/thinking further about this...

mschauer commented 1 year ago

Counting the number of "adjacent" CPDAGs sounds fun, the current approach is to test every possible operation and check whether it gives a consistent extension/new CPDAG?

Strange, 10.1214/13-AOS1125 do nothing like that, see algorithm 1, and therefore sample proportional to the number of CPDAG neighbours the CPDAG has.

mschauer commented 1 year ago

Correction: They do count or estimate the number of moves for each CPDAG $M_t$ and can use this later to account for the chain $c_t$ visiting CPDAGs with a large number of neighbours more often.

mwien commented 1 year ago

What I would find interesting in addition to the settings in the paper would be to sample MECs/CPDAGs with some control over the number of v-structures. E.g., similar to parameter $n$ which sets the maximum number of edges, we could have parameter $k$ for the maximum number of v-structures.

I find this rather natural because MECs are determined by skeleton and v-structures, why not play around with the latter as well. Also this is in contrast to what is currently done e.g. in benchmarking causal discovery algorithms where usually random DAGs are sampled which by construction have an extremely large number of v-structures (provided they are not too sparse). In the random DAG model typically used, if you have a collider a -> b <- c (the number of which grows quadratically with the indegree) the probability of it being shielded is basically the edge probability $p$ as every edge is drawn independently.

mschauer commented 1 year ago

Let's have a chat on https://julialang.zulipchat.com about it if you like.