py-why / causal-learn

Causal Discovery in Python. It also includes (conditional) independence tests and score functions.
https://causal-learn.readthedocs.io/en/latest/
MIT License
1.04k stars 174 forks source link

Questions about is_dseparated_from #160

Open kenneth-lee-ch opened 5 months ago

kenneth-lee-ch commented 5 months ago

I have the following GeneralGraph() object named D with the following adjacency matrix (from D.graph) and I use is_dseparated_from to check whether nodes 0 and 2 are d-separated given an empty set. The function is_dseparated_from seems to get stuck somewhere and it never returns. Is this expected?

[[ 0  0  0  0  0  1  0  1  1  0]
 [ 0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  1  0  0  0]
 [ 0  0  0  0  0  0  1  0  1  0]
 [ 0  0  0  0  0 -1  0 -1  0  0]
 [ 1  0  0  0  1  0  0  2  0  1]
 [ 0  0  1  1  0  0  0  0  0  1]
 [ 1  0  0  0  1  2  0  0  0  1]
 [ 1  0  0  1  0  0  0  0  0  0]
 [ 0  0  0  0  0  1  1  1  0  0]]
kenneth-lee-ch commented 5 months ago

I found a recursive behavior in is_dconnected_to. Suppose we have the following graph D:

X4->X1 o-o X2<->X5 ; X4->X2 

If you call is_dconnected_to(X1, X5, [], D), this will make the function to loop forever.

kenneth-lee-ch commented 5 months ago

I currently add a list prevent_recursive_ls to the function is_dconnected_to to bypass the pair of edge and node that have been previously visited as a workaround to mitigate the issue as shown below:

def is_dconnected_to(node1: Node, node2: Node, z: List[Node], graph: Graph):
    if node1 == node2:
        return True
    edgenode_deque = deque([])
    prevent_recursive_ls = [] # MODIFIED HERE 
    for edge in graph.get_node_edges(node1):
        if edge.get_distal_node(node1) == node2:
            return True 
        edgenode_deque.append((edge, node1))
    while len(edgenode_deque) > 0:
        edge, node_a = edgenode_deque.pop()
        node_b = edge.get_distal_node(node_a)
        for edge2 in graph.get_node_edges(node_b):

            node_c = edge2.get_distal_node(node_b)

            # node_a - node_b - node_c
            if node_c == node_a:
                continue

            if reachable(edge, edge2, node_a, z, graph):
                if node_c == node2:
                    return True
                else:
                    if (edge2, node_b) not in prevent_recursive_ls: # MODIFIED HERE 
                        edgenode_deque.append((edge2, node_b))
                        prevent_recursive_ls.append((edge2, node_b)) # MODIFIED HERE 

    return False
jdramsey commented 5 months ago

This is interesting, if I may interject. First of all, this isn't a legal PAG; the tail endpoints at X4 shouldn't be there in a PAG, or at least not both of them, so maybe this was obtained using background knowledge...? Second, the recursion error doesn't occur in Tetrad, even though it's also using a reachability method. Third, for a latent variable model, the right test is m-separation, not d-separation, but you should laugh at anyone who says that since they are the same algorithm (just differently interpreted).