nredell / shapFlex

An R package for computing asymmetric Shapley values to assess causality in any trained machine learning model
Other
71 stars 7 forks source link

Clarify and add sampling methods #3

Closed nredell closed 4 years ago

nredell commented 5 years ago

The sampling method(s) in the package need to be more clearly spelled out. There are a of couple related methods in the literature that I'd like to incorporate. Namely, there should be a clear trade-off that the user can make between sampling instances vs. features. Right now, the stochastic-ness in the algorithm is to sample a random instance and shuffle its features in one go...but there might be benefit to sampling one instance and shuffling its features multiple times. Seems like both approaches would converge in the limit but the whole point of the Monte Carlo approach is that we're nowhere near "the limit". Also, the impact of feature dependence needs to be worked out. I've done some reading here but I'm not confident about what the best approach is.

nredell commented 5 years ago

Added argument shapFlex(shuffle = ...) which supports the explore, exploit trade off. Need to run some simulations to look at parameter recovery along this scale.

nredell commented 4 years ago

This is a great paper about asymmetric Shapley values and causality (https://arxiv.org/pdf/1910.06358.pdf). The implementation is fairly straightforward; though, the API needs some thought when having the user specify causal constraints. lavaan and r-causal are possible approaches, but I'm not a huge fan of specifying constraints in one long string. I need to look more into their and other implementations. In any case, this is next on the implementation list because it's an infinitely useful iml method.

nredell commented 4 years ago

We're going to go ahead and close this out. This package has gone full "causal".