About the code efficiency

RichardHGL commented 2 years ago

Hi, Thanks for this useful package.

I want to check how long it takes for you to run the 'nlp.ipynb' in tutorials folder. It seems to take more than 10 mins in my server. Is there anything wrong?

yangwenz commented 2 years ago

Please try to run the code using GPU. SHAP for text is not very efficient, so we recommend using integrated-gradient to generate explanations for large language models. Here are some examples: https://github.com/salesforce/OmniXAI/blob/main/tutorials/nlp_imdb.ipynb https://github.com/salesforce/OmniXAI/blob/main/tutorials/vision/ig_vlm.ipynb

yangwenz commented 2 years ago

If you run this code at the first time, the code will download the large pretrained language model, it will take more time. Could you run this script again after the model is downloaded?

RichardHGL commented 2 years ago

Okay, I will try it again later/ Besides, I also get a bug for polyjuice:

Traceback (most recent call last):
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/omnixai/explainers/base.py", line 215, in explain
    explanations[name] = self.explainers[name].explain(X=X, **param)
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/omnixai/explainers/nlp/counterfactual/polyjuice.py", line 173, in explain
    return self._explain_classification(X=X, max_number_examples=max_number_examples, **kwargs)
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/omnixai/explainers/nlp/counterfactual/polyjuice.py", line 96, in _explain_classification
    perturb_texts = self._perturb(text.lower(), **kwargs)
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/omnixai/explainers/nlp/counterfactual/polyjuice.py", line 73, in _perturb
    perturb_texts = self.explainer.perturb(
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/polyjuice/polyjuice_wrapper.py", line 247, in perturb
    generated = generate_on_prompts(
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/polyjuice/generations/generator_helpers.py", line 69, in generate_on_prompts
    total_sequence = s["generated_text"].split(PERETURB_TOK)[-1]
KeyError: 'generated_text'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test_unfair_tos.py", line 443, in <module>
    main()
  File "test_unfair_tos.py", line 401, in main
    local_explanations = explainer.explain(x)
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/omnixai/explainers/base.py", line 217, in explain
    raise type(e)(f"Explainer {name} -- {str(e)}")
KeyError: "Explainer polyjuice -- 'generated_text'"

RichardHGL commented 2 years ago

I try to separately run explainer: ["shap"], ["lime"] and ["polyjuice"]. Shap is much faster while lime seems to cost much more time in running the above example. (polyjuice still show the error above)

Would you suggest using NLPExplainer in 'nlp.ipynb', or directly use from omnixai.explainers.nlp import LimeText as in nlp/lime.upynb? Is there any difference in efficiency?

yangwenz commented 2 years ago

There is no difference in efficiency between NLPExplainer or using individual explainers. NLPExplainer acts as an explainer factory. Which version of polyjuice was installed?

RichardHGL commented 2 years ago

polyjuice-nlp 0.1.5

yangwenz commented 2 years ago

Thanks a lot! We will check this issue.

RichardHGL commented 2 years ago

I just found the solution from https://github.com/tongshuangwu/polyjuice/issues/10

yangwenz commented 2 years ago

Thanks for raising this issue. We will add it to backlogs for future updates.

salesforce / OmniXAI

About the code efficiency #21