inseq-team / inseq

Interpretability for sequence generation models 🐛 🔍
https://inseq.org
Apache License 2.0
378 stars 36 forks source link

Does inseq work with Mistral-7B or Llama-2-7B models? #265

Closed hadiasghari closed 7 months ago

hadiasghari commented 7 months ago

Question

Hello, I was wondering if inseq works with either Mistral-7B or Llama-2-7B models? I can load the models without an issue and run the tutorial's minimum pair example ("The manager went home because..."). But the resulting saliency heatmap looks quite off (screenshot below).

Additional context

I run the following code:

from transformers import AutoModelForCausalLM, AutoTokenizer
import inseq

model_name =  "mistralai/Mistral-7B-Instruct-v0.2"  # or "meta-llama/Llama-2-7b-chat-hf" 
access_token = "XXX"  # needed because models are gated
hf_model = AutoModelForCausalLM.from_pretrained(model_name, token=access_token).cuda()
hf_tkz = AutoTokenizer.from_pretrained(model_name, token=access_token)

attrib_model = inseq.load_model(hf_model, "integrated_gradients", tokenizer=hf_tkz)
out = attrib_model.attribute(
    input_texts=["The manager went home because", "The manager went home because"],
    generated_texts=["The manager went home because he was sick.", "The manager went home because she was sick."],
    step_scores=["probability"],
)
out[0].aggregate("pair", paired_attr=out[1], do_post_aggregation_checks=False).show()

This results in the following saliency heatmap. The probabilities in the last row seem correct. But one would expect the cell for manager/he→she to be high/very red, which isn't the case.

saliencymap_mistral_minpair_20240422

Checklist

Thanks!

gsarti commented 7 months ago

Hi @hadiasghari, thank you for your interest in Inseq!

As you see from the example you posted, Inseq does work for the models you mention. In the example above you are computing integrated gradients for the two cases and using the pair aggregator to visualize the difference between the two, which may not be very informative, and contrastive attribution is more likely to give you a meaningful result in this case. Quoting from our tutorial:

While PairAggregator can be used to visualize the difference between two attribution outputs, using the difference in probability between an option A (e.g. he in the previous example) and option B (e.g. she) as a target for gradient-based attribution methods is a more principled way to obtain contrastive explanations answering the question "How is this feature X contributing to the prediction of A rather than B?".

However, we do not currently support integrated gradients for contrastive attribution, since the operation to expand steps for the contrastive target is currently not implemented. An example for your case in this setting would be:

from transformers import AutoModelForCausalLM, AutoTokenizer
import inseq

model_name =  "meta-llama/Llama-2-7b-chat-hf" 
access_token = "XXX"  # needed because models are gated
hf_model = AutoModelForCausalLM.from_pretrained(model_name, token=access_token).cuda()
hf_tkz = AutoTokenizer.from_pretrained(model_name, token=access_token)

attrib_model = inseq.load_model(hf_model, "saliency", tokenizer=hf_tkz)
out = attrib_model.attribute(
    input_texts="The manager went home because",
    generated_texts="The manager went home because he was sick.",
    contrast_targets="The manager went home because she was sick.",
    attributed_fn="contrast_prob_diff",
    step_scores=["probability", "contrast_prob_diff"],
)
out.show()

Let me know if this works for you!

gsarti commented 7 months ago

Hi @hadiasghari, any update on this? Can I close the issue?

hadiasghari commented 7 months ago

Dear @gsarti, thank you for your answer. I am not sure I fully understand the explanation as the same code works for the GPT2 model. But it's probably best that I follow this up on Discord and close the issue here.