yangalan123 / Amortized-Interpretability

Codebase for the ACL 2023 paper "Efficient Shapley Values Estimation by Amortization for Text Classification"
MIT License
10 stars 1 forks source link

Thermostat execution stuck at 0 #5

Closed sideDesert closed 3 months ago

sideDesert commented 3 months ago

I am trying to run the thermostat/run_explainer.py using the bash command which is given in the documentation. After downloading the dataset, and then running the file, the file seems to get stuck after some time on 0.

Expected Behavior

To shaply values to be calculated by thermostat without any errors

Current Behavior

Code execution seems to stop at 0 and doesn't move forward.

2024-05-18 15:22:53,484 -explain - INFO - (Progress) Starting explanation with config file: configs/yelp_polarity/bert/svs-3600.jsonnet
2024-05-18 15:22:53,572 -explain - INFO - (Config) Config: 
{
  "dataset": {
    "batch_size": 1,
    "columns": [
      "input_ids",
      "attention_mask",
      "special_tokens_mask",
      "token_type_ids",
      "labels"
    ],
    "end": 3600,
    "name": "yelp_polarity",
    "root_dir": "./experiments/thermostat/datasets",
    "split": "test"
  },
  "device": "cuda",
  "explainer": {
    "internal_batch_size": 1,
    "n_samples": 25,
    "name": "ShapleyValueSampling"
  },
  "model": {
    "mode_load": "hf",
    "name": "textattack/bert-base-uncased-yelp-polarity",
    "path_model": null,
    "tokenization": {
      "max_length": 512,
      "padding": "max_length",
      "return_tensors": "np",
      "special_tokens_mask": true,
      "truncation": true
    }
  },
  "path": "./experiments/thermostat",
  "visualization": {
    "columns": [
      "attributions",
      "predictions",
      "input_ids",
      "labels"
    ],
    "gamma": 2,
    "normalize": true
  },
  "experiment_path": "./experiments/thermostat/yelp_polarity/bert/svs-3600"
}
2024-05-18 15:22:53,572 -explain - INFO - (File I/O) Output file: ./experiments/thermostat/yelp_polarity/bert/svs-3600/seed_1/2024-05-18-15-22-53.ShapleyValueSampling.jsonl
2024-05-18 15:22:53,577 -explain - INFO - (Config) Explaining on device: cpu
/Users/siddarth.saha/Desktop/dev/dev-python/iml/shap/Amortized-Interpretability/py38/lib/python3.8/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
2024-05-18 15:22:56,346 -explain - INFO - (Progress) Loaded explainer
2024-05-18 15:22:56,347 -explain - INFO - (Progress) Initialized data loader
========= Namespace(bsz=10, c='configs/yelp_polarity/bert/svs-3600.jsonnet', home='.', seed=1) ========
========== configs_yelp_polarity_bert_svs-3600

  0%|          | 0/360 [00:00<?, ?it/s]2024-05-18 15:22:56,352 -explain - INFO - (Progress) Processing batch 0 / instance 0

Possible Solution

Steps to Reproduce

  1. Create a conda environment for python 3.8 - condo create -p ./py38 python=3.8
  2. Activate conda environment - condo activate ./py38
  3. Manually install these packages using pip - transformers, numpy, tqdm, torch, scikit-learn thermostat-datasets
  4. Create directory thermostat/experiments/thermostat/yelp_polarity/bert/svs-3600
  5. Create directory thermostat/expermients/datasets/yelp_polarity
  6. Run python thermostat/create_datasets.py
  7. Run the bash command within the thermostat folder bash run.sh task=yelp_polarity model=bert explainer=svs-3600 seed=1 batch_size=1 device=0

Context (Environment)

Python version - 3.8.19 (Conda Environment) Machine - MacBook Air M2 macOS Sonoma 14.4 pip version 24.0

Pip List

Package                  Version
------------------------ -----------
aiohttp                  3.9.5
aiosignal                1.3.1
annotated-types          0.6.0
async-timeout            4.0.3
attrs                    23.2.0
blis                     0.7.11
captum                   0.7.0
catalogue                2.0.10
certifi                  2024.2.2
charset-normalizer       3.3.2
click                    8.1.7
cloudpathlib             0.16.0
confection               0.1.4
contourpy                1.1.1
cycler                   0.12.1
cymem                    2.0.8
datasets                 2.19.1
dill                     0.3.8
filelock                 3.14.0
fonttools                4.51.0
frozenlist               1.4.1
fsspec                   2024.3.1
huggingface-hub          0.23.0
idna                     3.7
importlib_resources      6.4.0
Jinja2                   3.1.4
joblib                   1.4.2
jsonnet                  0.20.0
kiwisolver               1.4.5
langcodes                3.4.0
language_data            1.2.0
marisa-trie              1.1.1
MarkupSafe               2.1.5
matplotlib               3.7.5
mpmath                   1.3.0
multidict                6.0.5
multiprocess             0.70.16
murmurhash               1.0.10
networkx                 3.1
numpy                    1.24.4
nvidia-cublas-cu12       12.1.3.1
nvidia-cuda-cupti-cu12   12.1.105
nvidia-cuda-nvrtc-cu12   12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12        8.9.2.26
nvidia-cufft-cu12        11.0.2.54
nvidia-curand-cu12       10.3.2.106
nvidia-cusolver-cu12     11.4.5.107
nvidia-cusparse-cu12     12.1.0.106
nvidia-nccl-cu12         2.20.5
nvidia-nvjitlink-cu12    12.4.127
nvidia-nvtx-cu12         12.1.105
overrides                7.7.0
packaging                24.0
pandas                   2.0.3
pillow                   10.3.0
pip                      24.0
preshed                  3.0.9
protobuf                 5.26.1
pyarrow                  16.1.0
pyarrow-hotfix           0.6
pydantic                 2.7.1
pydantic_core            2.18.2
pyparsing                3.1.2
python-dateutil          2.9.0.post0
pytorch-ignite           0.5.0.post2
pytz                     2024.1
PyYAML                   6.0.1
regex                    2024.5.15
requests                 2.31.0
safetensors              0.4.3
scikit-learn             1.3.2
scipy                    1.10.1
sentencepiece            0.2.0
setuptools               69.5.1
six                      1.16.0
smart-open               6.4.0
spacy                    3.7.4
spacy-legacy             3.0.12
spacy-loggers            1.0.5
srsly                    2.4.8
sympy                    1.12
thermostat-datasets      1.1.0
thinc                    8.2.3
threadpoolctl            3.5.0
tokenizers               0.19.1
torch                    2.3.0
tqdm                     4.66.4
transformers             4.41.0
triton                   2.3.0
typer                    0.9.4
typing_extensions        4.11.0
tzdata                   2024.1
urllib3                  2.2.1
wasabi                   1.1.2
weasel                   0.3.4
wheel                    0.43.0
xxhash                   3.4.1
yarl                     1.9.4
zipp                     3.18.2
sideDesert commented 3 months ago

I have added a few additional logs which are reflected in the Current Behaviour section. I have made no code changed in the file otherwise.

yangalan123 commented 3 months ago

Hi, thanks for the feedback -- I was dealing with the dependency problem and I just pushed an update to the repo. It seems you have addressed the dependency problem at this issue. There is one thing I want to clarify a bit in the reproduction steps: There is no create_dataset.py in ./thermostat? I guess you are talking about download_data.py? We should run python download_data.py -c configs/yelp_polarity/bert/svs-3600.jsonnet -home . first before running run_explainer.py.

Then, for your observed stuck running, although I cannot run experiments on my borrowed Macbook, I successfully replicated this using a CPU-only machine. I intentionally modify the n_samples in thermostat/configs/yelp_polarity/bert/svs-3600.jsonnet to be 1 for debugging purposes and it gives me 11 minutes (in comparison, using the same config on an A40 GPU only takes 6 seconds, which is 110x faster). So if we use the original svs-3600.jsonnet config, it would take 11*25/60=4.58 hours for only a simple sample on CPU! That is why we see the program is "stuck" -- it is not really stuck, it just runs too slowly.

The thermostat code is not optimized for running on a CPU or on an M2 chip and running a BERT-level language model does take a lot of time if there is no specific accelerator like GPU because the codes here are normal pytorch huggingface codes and there is no specific optimization for CPU/M2. To assist with reproduction purpose, I have uploaded my computed svs-3600*-ish outputs files and would update the repo correspondingly. Check out README.md.

Also, if you want to replicate my further experiments like training amortized model, I would suggest you run on GPU rather than on an M2 chip as all of my codes are not optimized for the M2 chip as well. Unfortunately, I do not have the appropriate machine to do that and do not have the bandwidth to implement M2 support.

yangalan123 commented 3 months ago

Close this issue as the problem seems to be solved after offline discussion. Feel free to reopen it if needed.