jtcbrule / whittemore

Causal programming in Clojure
Eclipse Public License 1.0
26 stars 2 forks source link

Add references #1

Open samusz opened 3 years ago

samusz commented 3 years ago

Hi, I've stumble on your interesting package as it stand at the 'intersect' two of my (current interests (Clojure and causality).

I believe adding links to your papers and other refs in the README.md would improve this package discorverability.

Best.

jtcbrule commented 3 years ago

Thanks for the suggestion. I've added a couple references to README.md

One problem, that I didn't appreciate originally, is that the approach used in Whittemore does not easily extend to other types of random variables.

There's something of a 'tension' between identification and estimation. The identification algorithm (a variant of Shpitser's ID algorithm) makes no assumptions about the random variables, other than those assumptions that are represented in the causal diagram. The result is a functional (formula) that computes the causal effect, from the population probability distribution.

But estimating the causal effect from a sample distribution can't be done without additional assumptions.

The original idea behind Whittemore was to try to cleanly separate assumptions used in identification (represented in a causal diagram) and the assumptions used in estimation (which would be implemented by extending an appropriate Clojure protocol). This was easy for categorical random variables, and should be straightforward for normally distributed random variables (although I have not implemented this).

It is much less clear how to extend this to arbitrary, continuous random variables. I had vague plans to use some kind of multivariate kernel density estimation and MCMC methods to implement a "plug and chug" estimator, but didn't get around to it.

I think there's been a lot of work on better estimation techniques recently, but I haven't been keeping up with the literature.

samusz commented 3 years ago

Thank you for your remarks. I am quite new to closure but this project was of particular interest to me as I mostly deal with causal diagrams and we mostly use dagitty.net to get our confounding variables from a (somewhat) complete DAG. And then do classical stats odd ratios etc. (and very few bayesian ones unfortunately as it's "the norm" in big cohorts / longitudinal studies).