py-why / dowhy

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.
https://www.pywhy.org/dowhy
MIT License
6.88k stars 916 forks source link

gcm.arrow_strength providing different ranking #1130

Closed ankur-tutlani closed 4 months ago

ankur-tutlani commented 5 months ago

I am using arrow_strength function to identify top nodes showing variation in target node (Growth).

arrow_strengths = gcm.arrow_strength(scm, target_node='Growth',num_samples_conditional=5000,difference_estimation_func=gcm.divergence.estimate_kl_divergence_continuous_knn)
arrow_strength_pd = pd.DataFrame(list(arrow_strengths.items()), columns=['edge', 'importance'])
arrow_strength_pd=arrow_strength_pd.sort_values('importance',ascending=False)

There are ~ 40 nodes. After sorting I am getting different answers, say if X node is ranked on 10th, in another iteration using same causal graph and data, it moves to 30th place or vice versa. Is this behavior expected? Does this depend on causal graph structure?

Version information:

bloebp commented 5 months ago

The arrow strength has some sampling for estimation which leads to variations between runs. You can reduce this by changing some parameters like tolerance (to a smaller number).

Generally, if the rankings change that much between runs, it seems the connections are either equally strong or too weak in general (or the model simply isn't capturing them accurately enough). What is the range of the values?

You can also take a look at estimating confidence intervals, they might provide better insights: https://www.pywhy.org/dowhy/v0.11.1/user_guide/modeling_gcm/estimating_confidence_intervals.html#conveniently-bootstrapping-graph-training-on-random-subsets-of-training-data

ankur-tutlani commented 5 months ago

Thanks for sharing the link and this is helpful. Is there any recommendation in the library on the following?

  1. If the causal graph structure is not very certain. The "auto" option takes care of causal mechanisms, but is there anything similar for graph too?
  2. What are the recommendations to improve this if we get say following evaluation result?

The overall average KL divergence between the generated and observed distribution is 0.6444021604490836 The estimated KL divergence indicates a good representation of the data distribution, but might indicate some smaller mismatches between the distributions.

github-actions[bot] commented 5 months ago

This issue is stale because it has been open for 14 days with no activity.

bloebp commented 5 months ago

Sorry for the late reply!

1. If the causal graph structure is not very certain. The "auto" option takes care of causal mechanisms, but is there anything similar for graph too?

You can take a look at https://github.com/py-why/causal-learn, this is a package for inferring the causal graph based on data.

2. What are the recommendations to improve this if we get say following evaluation result?

You could try and set the parameter for the quality in the auto assignment function to BETTER (see the docstring of the function). Let me know if this improves the results (i.e., lower KL divergence). Otherwise, you might need to manually check which causal mechanisms can be improved. Maybe the performance results of nodes can give some insights.

github-actions[bot] commented 4 months ago

This issue is stale because it has been open for 14 days with no activity.

github-actions[bot] commented 4 months ago

This issue was closed because it has been inactive for 7 days since being marked as stale.