Open stevenlujpl opened 3 years ago
Please note that I am aware of the build failures (code formatting issues, please see the screenshot below) caused by the implementation of the causal graph. I can't fix these code formatting issues because I have to use sys.path.append('/PATH/TO/fges-py')
so that I can import the classes/functions needed for causal graphs. I will come up with something to replace sys.path.append('/PATH/TO/fges-py')
and fix the code formatting issues.
Below is a temporary solution to install DORA with causal graphs (for @hannah-rae to install it on UMD machine).
fges-py
github repository (https://github.com/eberharf/fges-py)git clone https://github.com/eberharf/fges-py.git
causal-graph
branch of the DORA repositorygit pull origin causal-graph
sys.path.append()
with the path to fges-py
repository on UMD machine. pip install .
(please note the .
at the end). I changed the graph layout to be circular. With the circular layout, at least we can see what nodes are connected. Please take a look at the following examples, and let me know what you think. Thanks.
@stevenlujpl I think these look great.
If you have time for tiny updates, I suggest (1) highlighting (e.g. in red) any lines that connect to the "cluster" node (since they are of most immediate interest and I think the others are constant for all clusters), (2) labeling "cluster" as "cluster X" to show the cluster index, and (3) using a light color to fill the nodes (instead of dark blue) so that the black text on top is easier to read.
@wkiri Thanks for the comments. I've incorporated them into the code. In addition, I also added the sparsity
parameter in the config file and seeded the SOM
clustering algorithm. Please see the new graphs below (please note that the causal relations are different than the examples shown in the post above because the seed
parameter used is different).
@stevenlujpl The updated visualization looks fantastic!
@stevenlujpl Is this ready to be closed now?
@hannah-rae, Not yet. Currently, all the updates for causal graphs are in causal-graph
branch. I am waiting to hear from Eric regarding whether the Caltech professor who developed fges-py
will create a setup.py script to package the repository or not. Below are the items I need to complete before we can close this issue:
fges-py
repository.fges-py
repository, create a setup.py for fges-py
, and then update our own setup.py to install the forked fges-py
repository
Hi @hannah-rae, @urebbapr , @wkiri , @emhuff ,
I've checked the initial implementation of causal graphs in the
causal-graph
branch.Example outputs of causal graphs
The example outputs of causal graphs generated using
sample_data/earth_fieldsamples/points_to_fit.csv
(data_to_fit) andsample_data/earth_fieldsamples/kenya_points_to_predict.csv
(data_to_score) are shown below. Please note that I filtered out 981 data points that contain missing values from thesample_data/earth_fieldsamples/points_to_fit.csv
file.Cluster 0 causal graph
Cluster 1 causal graph
Cluster 2 causal graph
Cluster 3 causal graph
Cluster 4 causal graph
SOM clustering results SOM-demud.csv
Implementation summary
Causal graphs are currently implemented together with the
kmeans
orSOM
clustering algorithm in theResults Organization
module. This is how causal graphs are generated in the DES codebase, and for the initial implementation of causal graphs in DORA, I decided to do the same thing. I don't think clustering algorithms are necessary to generate causal graphs. It seems to me that we can generate causal graphs for individual data points instead of a group of data points. If generating causals graphs for individual data points is desired, I can add this ability in DORA. Please let me know what you think.There is one issue that I don't know how to resolve yet. Causal graphs are generated using classes/functions in fges-py github repository, but this repository isn't installable (the authors don't provide a setup.py script). This isn't a big problem for us to use causal graphs on UMD/JPL machines. We can manually
git clone
the repository, and do something likesys.path.append("/PATH/TO/fges-py/")
to import classes/functions we need. However, this will become a problem when we publish the DORA codebase to Pypi as a pip installable package. I will need to think more about how to resolve this problem. Please let me know if you have any suggestions.Use causal graphs
For now, causal graphs must be generated with
kmeans
orSOM
clustering algorithm. Please see the following example configs for Results Organization module:There will be one causal graph generated per cluster group, and the causal graphs will be saved in the directory defined by
out_dir
option in the config file.