suinleelab / treeexplainer-study

Code and documentation for experiments in the TreeExplainer paper
179 stars 44 forks source link

Installation of required packages as per https://github.com/slundberg/shap results in import errors #2

Open sebastian-lapuschkin opened 3 years ago

sebastian-lapuschkin commented 3 years ago

... and version mismatches.

I installed the listed packages into a conda env called "shap" using conda install -c conda-forge shap. After cloning https://github.com/suinleelab/treeexplainer-study, installing and running jupyter lab / jupyter notebook, I receive the following error in the first (import) block of notebooks/mortality/NHANES I Analysis.ipynb

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-92f169e55cd2> in <module>
      2 import shap
      3 from sklearn.model_selection import train_test_split
----> 4 from sklearn.preprocessing import StandardScaler, Imputer
      5 import sklearn
      6 import matplotlib.pyplot as pl

ImportError: cannot import name 'Imputer' from 'sklearn.preprocessing' (/home/lapuschkin/miniconda3/envs/shap/lib/python3.8/site-packages/sklearn/preprocessing/__init__.py)

This seems to be caused by the notebook relying on the scikit learn release 0.19.1 (or older; etc.) while the described installation routine has been updated in the mean time, installing scikit learn release 0.24. Is there any easy and convenient fix to install the required packages to run the experiments in this repo?

Thank you for making your code publicly available.

sebastian-lapuschkin commented 3 years ago

Dear all,

I had to make some smaller changes here and there, some while setting up the (1) python environment, some in the (2) notebook itself to get the code to run. Changes are posted below in brief:

(1) changes during setup:

export CONDA_ALWAYS_YES="true"

# note: this is a fresh conda install.

conda create -n shap
conda activate shap

conda install -c conda-forge shap

# install further required packages and software
# packages installed via pip could not be resolved via conda

conda install jupyter
pip install xgboost 
conda install keras
pip install lifelines
conda install mpld3 # relevant for NHANES Nonlinearity
conda install statsmodels # relevant for NHANES Nonlinearity

# version mismatch between notebook and shap env
# conda remove scikit-learn
# conda install scikit-learn=0.19.1
# was in the end resolved by adapting the notebook code.

unset CONDA_ALWAYS_YES 

(2) changes in the notebook:

#from sklearn.preprocessing import StandardScaler, Imputer #deprecated: v0.19
from sklearn.preprocessing import StandardScaler # v0.24.1 
from sklearn.impute import SimpleImputer # v0.24.1 replaces Imputer from v0.19

I will upload the updated notebook(s) here once I have verified everything works out. That being said, is there any rough estimate available after how much time results can be expected from the TreeExplainer (wrt. #cores/cpu clock speed)? I am running the code right now on a XEON server CPU with 20 (logical) cores.

best,

sebastian-lapuschkin commented 3 years ago

FYI the fixed notebook. some weirdness in the order of cells remains, which should be resolvable by following the order of execution in the original notebook. Note that the results diverge marginally.

NHANES I Analysis.zip