yangalan123 / Amortized-Interpretability

Codebase for the ACL 2023 paper "Efficient Shapley Values Estimation by Amortization for Text Classification"
MIT License
10 stars 1 forks source link

Thermostat library not installing in conda environment with python 3.7.12 #4

Closed sideDesert closed 4 months ago

sideDesert commented 4 months ago

I am trying to manually install the packages on the go and run the thermostat/run_explainer.py using the bash command provided in the documentation. The first issue I have encountered is that even after trying to install the thermostat library using pip, it throws an error

Expected Behavior

The thermostat library should get installed without throwing any errors

Current Behavior

It throws an error citing that sklearn is deprecated

pip install thermostat-datasets
Collecting thermostat-datasets
  Using cached thermostat_datasets-1.1.0-py3-none-any.whl.metadata (1.8 kB)
Collecting captum>=0.3 (from thermostat-datasets)
  Using cached captum-0.7.0-py3-none-any.whl.metadata (26 kB)
Requirement already satisfied: datasets>=1.5 in ./py37/lib/python3.7/site-packages (from thermostat-datasets) (2.13.2)
Collecting jsonnet (from thermostat-datasets)
  Using cached jsonnet-0.20.0.tar.gz (594 kB)
  Preparing metadata (setup.py) ... done
INFO: pip is looking at multiple versions of thermostat-datasets to determine which version is compatible with other requirements. This could take a while.
Collecting thermostat-datasets
  Using cached thermostat_datasets-1.0.2.1-py3-none-any.whl.metadata (1.8 kB)
Collecting jsonnet-binary (from thermostat-datasets)
  Using cached jsonnet_binary-0.17.0-cp37-cp37m-manylinux2010_x86_64.whl.metadata (817 bytes)
Requirement already satisfied: numpy>=1.20 in ./py37/lib/python3.7/site-packages (from thermostat-datasets) (1.21.6)
Collecting overrides (from thermostat-datasets)
  Using cached overrides-7.7.0-py3-none-any.whl.metadata (5.8 kB)
Requirement already satisfied: pandas in ./py37/lib/python3.7/site-packages (from thermostat-datasets) (1.3.5)
Collecting protobuf (from thermostat-datasets)
  Using cached protobuf-4.24.4-cp37-abi3-manylinux2014_x86_64.whl.metadata (540 bytes)
Collecting pytorch-ignite (from thermostat-datasets)
  Using cached pytorch_ignite-0.5.0.post2-py3-none-any.whl.metadata (27 kB)
Collecting scipy (from thermostat-datasets)
  Using cached scipy-1.7.3-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.metadata (2.2 kB)
Collecting sentencepiece (from thermostat-datasets)
  Using cached sentencepiece-0.2.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.7 kB)
Collecting sklearn (from thermostat-datasets)
  Using cached sklearn-0.0.post12.tar.gz (2.6 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [15 lines of output]
      The 'sklearn' PyPI package is deprecated, use 'scikit-learn'
      rather than 'sklearn' for pip commands.

      Here is how to fix this error in the main use cases:
      - use 'pip install scikit-learn' rather than 'pip install sklearn'
      - replace 'sklearn' by 'scikit-learn' in your pip requirements files
        (requirements.txt, setup.py, setup.cfg, Pipfile, etc ...)
      - if the 'sklearn' package is used by one of your dependencies,
        it would be great if you take some time to track which package uses
        'sklearn' instead of 'scikit-learn' and report it to their issue tracker
      - as a last resort, set the environment variable
        SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True to avoid this error

      More information is available at
      https://github.com/scikit-learn/sklearn-pypi-package
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Possible Solution

Updating to python 3.8 does fix the installation problem for this package, after installing scikit-learn.

Steps to Reproduce

  1. Create a conda environment for python 3.7 . I used the command conda create -p ./py37 python=3.7 2.pip install transformers datasets numpy torch tqdm (I installed them using separate pip commands)
  2. pip install thermostat-datasets

Context (Environment)

Pip List

Package                  Version
------------------------ -----------
aiohttp                  3.8.6
aiosignal                1.3.1
async-timeout            4.0.3
asynctest                0.13.0
attrs                    23.2.0
certifi                  2024.2.2
charset-normalizer       3.3.2
datasets                 2.13.2
dill                     0.3.6
filelock                 3.12.2
frozenlist               1.3.3
fsspec                   2023.1.0
huggingface-hub          0.16.4
idna                     3.7
importlib-metadata       6.7.0
joblib                   1.3.2
multidict                6.0.5
multiprocess             0.70.14
numpy                    1.21.6
nvidia-cublas-cu11       11.10.3.66
nvidia-cuda-nvrtc-cu11   11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11        8.5.0.96
packaging                24.0
pandas                   1.3.5
Pillow                   9.5.0
pip                      24.0
pyarrow                  12.0.1
python-dateutil          2.9.0.post0
pytz                     2024.1
PyYAML                   6.0.1
regex                    2024.4.16
requests                 2.31.0
safetensors              0.4.3
scikit-learn             1.0.2
scipy                    1.7.3
setuptools               65.6.3
six                      1.16.0
threadpoolctl            3.1.0
tokenizers               0.13.3
torch                    1.13.1
torchvision              0.14.1
tqdm                     4.66.4
transformers             4.30.2
typing_extensions        4.7.1
urllib3                  2.0.7
wheel                    0.34.2
xxhash                   3.4.1
yarl                     1.9.4
zipp                     3.15.0

Pip version - 24.0 Conda version - 23.1.0 Env Python Version - 3.7.12

yangalan123 commented 4 months ago

Thanks for the feedback and detailed reproduction explanation! As I explained in #3 , I do see upstream dependency is broken perhaps in Python 3.7 and Python 3.8. I think a more sustainable approach would be we migrate to Python 3.9 or 3.10. I will do some testing after I get a machine with M2 chip. Let's close this issue for now and focus our discussion in #3 and #5 .