matheusfacure / python-causality-handbook

Causal Inference for the Brave and True. A light-hearted yet rigorous approach to learning about impact estimation and causality.
https://matheusfacure.github.io/python-causality-handbook/landing-page.html
MIT License
2.61k stars 456 forks source link

Chapter 10 - Matching #332

Closed grstathis closed 1 year ago

grstathis commented 1 year ago

It is more a clarification rather than an issue:

In chapter 10 and paragraph about matching bias, it is stated : "The ATE estimate is just a little bit lower than mine, so probably my code is not perfect, so here is another reason to import someone else’s code instead of building it your own."

I implemented the code of the book and checked the implementation on the causalinference python package and the slight difference in ATE stems from getting the nearest neighbors matches using knn sklearn implementation (this book) vs calculating euclidean distance using scipy (python package causalinference). Those two methods result in slightly different nearest neighbor matches because they break the ties on different indexes of the data (probably due to tree search implementation of the knn).

Your code is fine :-)

Suggestion : So in case you need to be explicit it can safely be said something in the line of : "The ATE estimate is just a little bit lower than mine, due to the difference in tie breaking of matches of knn sklearn implementation and the causalinference python package. Always a good idea to validate your code using tested libraries."

matheusfacure commented 1 year ago

Thanks for clarifying this!