py-why / causal-learn

Causal Discovery in Python. It also includes (conditional) independence tests and score functions.
https://causal-learn.readthedocs.io/en/latest/
MIT License
1.04k stars 174 forks source link

Adding background knowledge using FCI #136

Open asha24choudhary opened 9 months ago

asha24choudhary commented 9 months ago

Hi. I am using fci using the background knowledge. I have a dataframe with 287 features. I'm doing this

cg_without_background_knowledge = fci(merged_data1.to_numpy(), node_names=merged_data1.columns). The output of this is a tuple of two which has graphs and edges.

I then get the nodes in this way: nodes = cg_without_background_knowledge[0].get_nodes()

I print the node names as:

for node in cg_without_background_knowledge[0].nodes: 
    print(node.get_name())

The output of this is X1, X2,....., X287. I want to add background knowledge. I tried the following methods

##1)
bk = BackgroundKnowledge()
for i in range(len(node_pairs)):
     bk.add_forbidden_by_node(GraphNode(node_pairs[i][0]), GraphNode(node_pairs[i][1]))
##2) 
bk = BackgroundKnowledge()
for i in range(len(nodes_forbidden)):
    bk.add_forbidden_by_node(**nodes[nodes_forbidden[i][0]], nodes[nodes_forbidden[i][1]]**)

where node_pairs=[('feature_1', 'feature_2') ,('feature_x', 'feature_y' ).....] & nodes_forbidden=[0, 7), (0, 14), (0, 21),(0, 28),....].

When I rerun the fci algorithm as G, edges = fci(merged_data1.to_numpy(), background_knowledge=bk, node_names=merged_data1.columns)

and when I check the G and I can see that there is still a connection between nodes[0], nodes[7] which I try to forbid in the background knowledge. If there is a link between nodes 0 & 7, I do not want it to be like nodes[0] -> nodes[7]. However, the other way is fine but I do not see the case. I still have connection as nodes[0] -> nodes[7].

My question to you is that how can I know what X1, X2, ....X287 maps to according to my dataset? I am not getting the names of the node as per my data in spite of passing the parameter node_names = merged_data1.columns.

Please help!

asha24choudhary commented 9 months ago

I further investigated in the final graph G that I got from FCI.

a=[]
for i in range(len(nodes_forbidden)):
    if (G.get_directed_edge(nodes[nodes_forbidden[i][0]], nodes[nodes_forbidden[i][1]])) is not None:
        print(nodes_forbidden[i])
        a.append(nodes_forbidden[i])

Length of the forbidden node pairs provided in nodes_forbidden is 40180. And from the above code in this comment len(a) =418, which means there are 418 edges that I prohibited using bk.add_forbidden_by_node method but, they still exist. Could you please tell me why this is happening and how can I resolve it?

I am not understanding the cause of the problem at the first place.

kunwuz commented 8 months ago

Thanks for reporting. If you would like to, could you please send us a minimal reproducing example of this issue (perhaps via email: yujiazh@cmu.edu)?