iancovert / sage

For calculating global feature importance using Shapley values.
MIT License
255 stars 34 forks source link

Parallelized computation #11

Closed domenicodigangi closed 1 year ago

domenicodigangi commented 2 years ago

First of all, many thanks for the amazing package and very clear paper and blogpost indicated in the readme. Please correct me if I am mistaken, but it looks like you are not using multiprocessing for parallel computation (I am thinking joblib) of the terms in the shapley values. Is there a technical reason for this? If not, are you planning to add it in the future?

iancovert commented 2 years ago

Hi Domenico, thanks for posting and I'm glad you enjoyed the paper and blog post. You're right that we're not currently doing any parallelization for SAGE value estimation - we just have a single loop to generate samples (which are averaged and monitored for convergence). But we could theoretically speed things up by having multiple jobs generating and aggregating samples. I wasn't planned on adding this because no one asked before, but maybe it could be worthwhile. Would this be helpful for a use case you're encountering?

j-adamczyk commented 2 years ago

@iancovert I have a same request as @domenicodigangi. The use case is using SAGE for larger datasets. I think there is no theoretical disadvantage here to using multiprocessing? With Joblib, this would be quite trivial to implement. For my use case, which is comparing speed of mean absolute SHAP to SAGE, this would be nice for fair comparison, since I already parallelize SHAP computation through Joblib. SAGE should obviously be faster even in this case, but difference can be even larger.

Michellemingxuan commented 2 years ago

@j-adamczyk Hi! I'm also comparing mean absolute SHAP to SAGE, but it seems that the result is not as expected (no parallelization for both). May I know which explainer of SHAP you are using for this comparison? Thanks!

j-adamczyk commented 2 years ago

@Michellemingxuan I am using TreeExplainer. However, this was highly nontrivial after all due to SHAP + LightGBM combination (for speed, about 10x faster than XGBoost and at least 100x faster than Random Forest), with some monkey patching and other weird fixed required. Parallelized SAGE would make this much easier (as long as results are similar enough, which I have not tested).

@iancovert could you point me to areas where parallelization could be implemented in SAGE? I can make a PR, I just need a bit of help.

iancovert commented 2 years ago

Hi Jakub - thanks for following up on this, and sorry for the slow reply. I've thought about this a bit (though I haven't had a chance to work on it) and I think you're right that it may be simple to implement with joblib, and there would be no theoretical tradeoffs (the algorithm would remain exactly the same).

If you have bandwidth to give it a try, I think I know which part we would want to parallelize. The permutation sampling algorithm (implemented here) involves obtaining a minibatch of samples and a feature permutation for each sample, making predictions using the corresponding subsets of features, calculating the change in loss, and updating the average loss delta associated with each feature. If we could kick off child processes to concurrently handle different minibatches of samples, I think we would see a large speedup. The child processes would basically need to run lines 109-149, send the results (scores and sample_counts) to the parent so it can run lines 155-187, and repeatedly do this until convergence (detected by the parent).

Does that make sense? I'd be very happy to discuss more and help out if you want to try implementing.

Michelle - sorry for the slow response, I just replied to your Disqus comment on the blog post. We used the permutation sampling estimator when comparing to SHAP.

j-adamczyk commented 2 years ago

@iancovert thanks for the reply. I actually stumbled upon those lines myself when searching recently, so fortunately I am familiar what is happening there. I think that parallelizing this will be very similar to parallelizing SHAP, with which I have done already. I will try to make a PR soon.

iancovert commented 2 years ago

I think you're right, it should be similar to how you would parallelize SHAP. Very cool, well please let me know if I can be of any help with your PR - excited to see if this works.

j-adamczyk commented 2 years ago

@iancovert I am not sure if we can replicate exactly the same behaviour with parallelization. Before permutation sampling, scores is initialized as all zeros array. In the loop, it is zeroed only if relaxed is true, and otherwise previous values are kept. scores is the modified in line 148 by setting scores[arange, inds] = prev_loss - loss. So for relaxed=False current code relies on sequential execution.

By logic in lines 74-79, relaxed being true requires min_coalition > 0 or max_coalition < num_features to be false, so min_coalition <= 0 and max_coalition >= num_features has to be true, if I transform this correctly. I am not sure that this is even possible. If we assume that relaxed is always true, then we can initialize scores as all zeroes in each child process. Alternatively, we can keep sample_counts as is, but always initialize scores as all zeroes. Both would change the current behavior, but the second option would change the underlying logic more. What do you think?

j-adamczyk commented 2 years ago

@iancovert under the assumption that relaxed=True, I managed to parallelize PermutationExplainer, getting speedups from 30% to 50%. After the decision what to do about it, I will do the PR.

iancovert commented 2 years ago

Hi Jakub - sorry for the slow reply. I think allowing relaxed=True is a nice feature, but it's probably not necessary. The feature isn't described in any of the demos or in any papers so I doubt many people use it, and it might be easier to implement parallelization without it. What do you think of just removing it and sticking with the simpler calculation?

And by the way, does your implementation make a separate ParallelPermutationExplainer class or is parallelization built-in to the original one? I'm not sure what would be best, but I guess many other libraries implement parallelization in the main function and have a num_workers argument. What do you think?

j-adamczyk commented 2 years ago

Hi Ian. If we stick with one version with relaxed=False, then the parallel and sequential implementations would really be identical, so we could just modify the existing class. We can indeed use n_jobs like Scikit-learn, with default value 1, so the behavior won't change (apart from edge cases with relaxed=True). I think that dropping relaxed=True is good also from the point of code complexity, because, as you are saying, it is not documented and also should not have a noticable impact on results anyway, I think.

I will make a PR with my changes soon. Also another question - should we make joblib a required dependency? I think that this should not be problematic, since Scikit-learn, XGBoost, LightGBM etc. require it anyway.

As a related note, what do you think about rewriting parts of SAGE to Cython? I tried optimizing a few loops with Numpy, but didn't get any improvements. Cython would definitely help here, and it does not require any code changes apart from moving the most compute-heavy functions to another module, compiling and calling them from regular classes. This would be almost identical to Scikit-learn in this regard.

iancovert commented 2 years ago

Setting n_jobs=1 as a default argument sounds like the right approach. And making joblib a required dependency seems like a good idea to me - like you said, it should be harmless since so many packages already require it.

As for shifting some of the implementation to Cython, I think that's a great idea as well. I would be very interested to see what kind of speedup that provides. I don't have any experience with this, is this also something you would be comfortable trying? I appreciate all these ideas!

j-adamczyk commented 2 years ago

I don't have much experience with Cython, but I've read it and I wrote a "Hello world" some time ago. After I make the joblib PR, I will try to work on this.

iancovert commented 2 years ago

Okay, so it would be new for you as well. Well I appreciate your willingness to look into this, I'll be happy to help in any way I can.

iancovert commented 1 year ago

Closing this because Jakub's PR has been merged. Thanks!