linkedin / FastTreeSHAP

Fast SHAP value computation for interpreting tree-based models
BSD 2-Clause "Simplified" License
512 stars 31 forks source link

SHAP Values Change and Additivity Breaks on NumPy Upgrade #15

Open ngupta20 opened 1 year ago

ngupta20 commented 1 year ago

Description

Following the 0.1.3 release, the additivity of my logit-linked xgboost local tree shaps broke. I found that pinning my numpy version back to either 1.21.6 or 1.22.4 temporarily solved the issue. I also tested every numpy version from 1.23.1 to 1.23.5 and found that all numpy versions 1.23.* were associated with the value changes.

Brainstorming

It makes me wonder if there is any numpy or numpy-adjacent usage in FastTreeSHAP that could be reimplemented to be compatible with new versions. I would be glad to contribute if that's the case and someone could help restrict the scope of the problem (if I can't restrict it myself quickly enough). As we all know from #14, that would be highly preferable to pinning numpy versions. My instinct tells me this issue may link to an existing NumPy update note or issue... haven't looked too deeply yet.

Good Release

In any case, I still think the decision to relax the numpy constraint is the right one and anyone who runs into the same behavior can just pin the version themselves in the short term.

Environment Info

Partial result of pip list

Package                       Version
----------------------------- -----------
fasttreeshap                  0.1.3
pandas                        1.5.2
numba                         0.56.4
numpy                         1.23.5
setuptools                    65.6.3
shap                          0.41.0
wheel                         0.38.4
xgboost                       1.7.1

uname --all

Linux ip-**-**-*-*** 5.4.0-1088-aws #96~18.04.1-Ubuntu SMP Mon Oct 17 02:57:48 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

lscpu

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
Stepping:            7
CPU MHz:             3102.640
BogoMIPS:            5000.00
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            36608K
NUMA node0 CPU(s):   0-7
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni

Thanks for your responsiveness to the community!

Nandish Gupta Senior AI Engineer, SolasAI image

jlyang1990 commented 1 year ago

Hi Nandish, thanks for raising this issue and providing some temporary resolutions. I just checked shap GitHub page, and found out some recent issues related to additivity, e.g., https://github.com/slundberg/shap/issues/2778 and https://github.com/slundberg/shap/issues/2777. Just curious whether you have tried your dataset on shap as well and whether the additivity issue still exist. Thanks!