Closed lorenjan closed 1 year ago
Hello! Looking forward to your reviews and thoughts about these changes. Hopefully it's interesting to see what I had to do to be able to run on larger data sets. (Millions of rows, thousands of columns...)
There were some data structure copies that were being made in a couple of places which were taking a lot of extra memory, and I removed them, but maybe this is not safe to do for some reason? Please let me know if I did anything wrong and I will happily improve the code if I can.
Merging #38 (c517bf6) into main (6b54fb7) will increase coverage by
0.08%
. The diff coverage is100.00%
.
:exclamation: Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.
@@ Coverage Diff @@
## main #38 +/- ##
==========================================
+ Coverage 83.87% 83.95% +0.08%
==========================================
Files 6 6
Lines 372 374 +2
==========================================
+ Hits 312 314 +2
Misses 60 60
Impacted Files | Coverage Δ | |
---|---|---|
powershap/powershap.py | 68.67% <100.00%> (ø) |
|
powershap/shap_wrappers/shap_explainer.py | 96.83% <100.00%> (+0.04%) |
:arrow_up: |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
If you add some more comments about the reasoning behind this https://github.com/predict-idlab/powershap/pull/38#discussion_r1189452588, this PR is going in :)
This is a major contribution to powershap
@lorenjan
I enjoyed discussing the code changes with you & I really appreciate the effort you put into this :handshake:
These modifications make it possible to run PowerShap on much larger datasets without running out of memory. This is especially useful in places such as Kubernetes clusters which may terminate your process if you go over a memory limit.