py-why / causal-learn

Causal Discovery in Python. It also includes (conditional) independence tests and score functions.
https://causal-learn.readthedocs.io/en/latest/
MIT License
1.04k stars 174 forks source link

Using FCI with true graph known, specifying latent variables and background knowledge? #162

Closed ungvilde closed 4 months ago

ungvilde commented 5 months ago

Hello!

I want to study the output of the FCI algorithm when I have background knowledge and latent variables. Let G_true be a DAG, let and G_observed be a subgraph of G_true that contains the observed variables. Then something like this would be desirable:

fci(np.empty(shape=(1, number_of_observed_nodes), 
    d_separation, 
    true_dag=G_true, 
    observed_nodes = [list of observed nodes], # these are the nodes in G_observed
    background_knowledge = bk # relevant background knowledge to G_observed
   )

I am basically interested in exploring the possible number of orientable edges for a partially observed DAG, when there is background knowledge. Is this something that could be possible to do with causal-learn? :)

Thank you very much.

kunwuz commented 5 months ago

Yes, and you can check how to incorporate background_knowledge here: https://github.com/py-why/causal-learn/blob/main/tests/TestBackgroundKnowledge.py. The following code would be enough, where the data is of the shape (# of samples, # of observed variables):

g, edges = fci(data, independence_test_method, background_knowledge)
ungvilde commented 5 months ago

@kunwuz Thanks for the quick reply!

However, I was hoping to find a method for using the FCI algorithm with a CI oracle. As far as I understand it, doing

fci(data, independence_test_method, background_knowledge)

will use CI tests based on the data sample and the chosen method.

But if G is a DAG with N variables, then I can do

fci(np.empty(shape=(1, N)), d_separation, true_dag=G, background_knowledge=bk)

to get the FCI output with a CI oracle for G, incorporating background knowledge. But what if G is only partially observed? In other words, if G is a subgraph of some other DAG G_true.

Thanks again.

EDIT: The reason I want to do this is to compare the theoretical results of the algorithm with the finite sample result. The structures I am studying are known to have latent variables, so I want to see how they affect the results.

kunwuz commented 5 months ago

Hi, I think it might be possible but can't be sure for now since I haven't tried FCI with a CI Oracle for some time. A quick option, as suggested by @jdramsey, would be to try tetrad (or py-tetrad as a Python wrapper) for this. Perhaps Joe (@jdramsey) could provide a better response?

jdramsey commented 5 months ago

Sure, this is something that's actually very easy to do in Tetrad (or py-tetrad); I could show you the basic idea if you like. The basic idea is you'd make the full DAG with all of the variables but set the type of some of the nodes to Latent. Then you define your background knowledge and do an FCI search with the knowledge using the "MsepTest." That will give you the ideal PAG you're looking for, I think.

ungvilde commented 5 months ago

Thanks for the reply, @jdramsey. That sounds exactly like what I want to do! It could actually be really helpful if you showed me the basic idea... Only if it is not too much trouble, of course.

jdramsey commented 5 months ago

Absolutely! I'll type out some py-tetrad code and send it by tomorrow. You should be able to morph it into your example.

ungvilde commented 5 months ago

Thank you, I really appreciate it!

søn. 28. jan. 2024, 00:59 skrev Joseph Ramsey @.***>:

Absolutely! I'll type out some py-tetrad code and send it by tomorrow. You should be able to morph it into your example.

— Reply to this email directly, view it on GitHub https://github.com/py-why/causal-learn/issues/162#issuecomment-1913375083, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQBFUGVHJQDTN3W7GVBFEUDYQWIGTAVCNFSM6AAAAABCNPYOVCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJTGM3TKMBYGM . You are receiving this because you authored the thread.Message ID: @.***>

jdramsey commented 5 months ago

@ungvilde Here you go; I put an answer here:

https://github.com/cmu-phil/py-tetrad/issues/18

As I said there, if you need to do this a different way, let me know.