py-why / pywhy-graphs

[Experimental] Causal graphs that are networkx-compliant for the py-why ecosystem.
https://py-why.github.io/pywhy-graphs/dev/index.html
MIT License
47 stars 8 forks source link

[ENH] Add the ability to find Proper Possibly Directed Paths #112

Closed aryan26roy closed 2 months ago

aryan26roy commented 4 months ago

Add the ability to find Proper Possibly Directed Paths to allow integration with DoWhy.

Fixes #111

Changes proposed in this pull request:

Before submitting

After submitting

adam2392 commented 3 months ago

The CIs should now work. You can also test locally ofc with pytest.

aryan26roy commented 2 months ago

@adam2392 I have completed the implementation and it seems to be correct. However, there is one problem. The output is a list of dictionaries. And the dictionaries can be in any order inside the list which is why the test may fail right now. How do I compare them in an order agnostic way?

I know the brute force way which is to compare every element of the expected output to every element of the actual output. Was wondering if there is a better way.

adam2392 commented 2 months ago

Is there any inspiration from network's method for testing all_simple_paths?

https://github.com/networkx/networkx/blob/main/networkx/algorithms/tests/test_simple_paths.py

aryan26roy commented 2 months ago

They are returning a set of tuples. I am assuming the order of tuples inside a set won't matter during comparison because of it's hashibality. However, I cannot store dictionaries inside sets. Either I change my implementation to store paths in sets(tuple()) or I can do the tests using the brute force method I described above.

What do you want me to do?

adam2392 commented 2 months ago

Is there any technical reason to do dictionaries instead of triples that I'm missing?

aryan26roy commented 2 months ago

Sorry for the late reply.

Is there any technical reason to do dictionaries instead of triples that I'm missing?

Not really, I just did not think of using tuples at the time. I think it would be easy to go from List to Set. A better approach in terms of memory as well.

aryan26roy commented 2 months ago

@adam2392 I just realised, since in this implementation, all the first nodes of all the paths are in X, this finds all the proper possibly directed paths, doesn't it?

adam2392 commented 2 months ago

Perhaps it's easier to implement just the function wrt a single node rather than a set of nodes. Do we need the functionality with X as a set? Wdyt?

aryan26roy commented 2 months ago

The paper assumes everywhere that X will be a set. And the current implementation already covers and tests X being a set. It seems to me that changing the implementation now to only support a single node in X would be wasted effort.

adam2392 commented 2 months ago

@adam2392 I just realised, since in this implementation, all the first nodes of all the paths are in X, this finds all the proper possibly directed paths, doesn't it?

Yes I believe so.

aryan26roy commented 2 months ago

@adam2392 I believe the PR is complete. Can you do one last round of review? (PS. The failing test is due to some environment issue with some existing test)

aryan26roy commented 2 months ago

@adam2392 I have incorporated all of your suggestions. Can you take another look?

aryan26roy commented 2 months ago

Done!

aryan26roy commented 2 months ago

@adam2392 This can be optimised further by switching to a stack based implementation rather than recursion. We can add this to the TODO list.

adam2392 commented 2 months ago

Thanks @aryan26roy !