py-why / causal-learn

Causal Discovery in Python. It also includes (conditional) independence tests and score functions.
https://causal-learn.readthedocs.io/en/latest/
MIT License
1.12k stars 185 forks source link

Using background knowledge makes FCI algorithm slower #90

Open verae98 opened 1 year ago

verae98 commented 1 year ago

Hi! I would like to use the FCI algorithm with background knowledge, but I have noticed that the computation speed with the background is much slower than the computation speed without passing the background knowledge to FCI (or PC). I work with about 300 variables and there are not many dependencies (about 1,2 or 3) between them. In my background knowledge there are a lot of forbidden edges and some required edges. As far as I understand the FAS function, the amount of forbidden edges will reduce the adjacency list by a lot and also the separating set will be smaller than the version without the background knowledge.

I cannot find a explanation for the increase of computation speed, what am I missing?

Thanks in advance!

jdramsey commented 1 year ago

The issue I think is that checking background knowledge requires an overhead, so if you don't have to do it the algorithms will run faster.

The way it's stored is in pairs <{n1, ..., nm>, {nq...nr}>, where edges are forbidden in reverse. The set lookups should be fast (i.e., if you forbid all of the second from causing any of the first, and you try to add an edge x1->x2, if will look to see if x1 is in the second set and x2 is in the first set and if so will forbid it). If you give separate singleton pairs for each, like <{x1}, {x2}>, it has to go through a lot of sets to figure this out. Do you know how your knowledge is being entered?

verae98 commented 1 year ago

Thank you for your reply! I store the forbidden and required edges in self.forbidden_rules_specs and self.required_rules_specs for the BackgroundKnowledge class which I thought only allow singleton pairs, since the type is Set[Tuple[Node, Node]]. That explains why the algorithm is slower then. But how can I change then then to the pairs with multiple values?

verae98 commented 1 year ago

Hi, I have added a childclass of this BackgroundKnowledge class called FastBackgroundKnowledge in my code which inherits all the properties of the BackgroundKnowledge, except for the is_forbidden() and is_required() functions to fix my problem.

I have added two dictionaries for which the required and forbidden edges are stored in. For each required edge X1 -> X2 we have a dict_required {"X1" : [X2]}. This way the is_forbidden() function and is_required() function do not have to look through all sets, but has a lookup by node name. This has lowered the FCI function computation time significantly!

There is only one problem, in the file fas.py line 286, type checking has been done by stating type(knowledge) == BackgroundKnowledge I would like to suggest to change this to isinstance(knowledge, BackgroundKnowledge), such that inheritance is possible :)

kunwuz commented 1 year ago

Thanks so much! Your solution looks fantastic : ) If you would like to, could you please create a PR (including a short test example) for this so that your contribution could be better recognized? We can also review and push this if everything goes smoothly. Of course, we greatly appreciate your suggestions whether you would like to PR it or not.