Open Ningshiqi opened 3 weeks ago
Hi @Ningshiqi, I understand the plot was generated with Captum v0.7, but I'm not sure what I'm looking at exactly. Are the columns different examples concatenated with their labels? What do scores for "Positive" represent for examples where the final token is "Negative"? Could you please provide a code snippet used to generate this figure?
Hi @Ningshiqi, I understand the plot was generated with Captum v0.7, but I'm not sure what I'm looking at exactly. Are the columns different examples concatenated with their labels? What do scores for "Positive" represent for examples where the final token is "Negative"? Could you please provide a code snippet used to generate this figure?
1 This diagram was indeed generated using captumV0.7.
2 Their few shot examples are concatenated with labels, and the official code below confirms this conclusion.
3 In the official demo official demo url A quote from this case:
"Interestingly, we can see all these few-shot examples we choose actually make the model less likely to correctly label the given review as "Positive".
The model did generate the correct postive label for this movie review, but the scores in the graph show that the four examples from few-shot did not contribute to the generated postive label.
The Code in Official demo is below.
sv = ShapleyValues(model)
sv_llm_attr = LLMAttribution(sv, tokenizer)
def prompt_fn(*examples):
main_prompt = "Decide if the following movie review enclosed in quotes is Positive or Negative:\n'I really liked the Avengers, it had a captivating plot!'\nReply only Positive or Negative."
subset = [elem for elem in examples if elem]
if not subset:
prompt = main_prompt
else:
prefix = "Here are some examples of movie reviews and classification of whether they were Positive or Negative:\n"
prompt = prefix + " \n".join(subset) + "\n " + main_prompt
return "[INST] " + prompt + "[/INST]"
input_examples = [
"'The movie was ok, the actors weren't great' Negative",
"'I loved it, it was an amazing story!' Positive",
"'Total waste of time!!' Negative",
"'Won't recommend' Negative",
]
inp = TextTemplateInput(
prompt_fn,
values=input_examples,
)
attr_res = sv_llm_attr.attribute(inp)
attr_res.plot_token_attr(show=True)
I think the way captum is implemented is to index each example of few-shot and then aggregate each example to produce a "postive" score, right?
If we use Inseq to achieve this effect, I understand that we should also take the startpos and endpos of each excample and aggregate it. Is this possible?
Ah I see, so the examples are actually in-context demonstrations, and the scores represent the aggregated contribution for the full example. If you install from main (I fixed LLaMA 3.2 support in #289) you can do something like:
import transformers
import inseq
# Create prompt: same as in Captum example
def prompt_fn(*examples):
main_prompt = "Decide if the following movie review enclosed in quotes is Positive or Negative:\n'I really liked the Avengers, it had a captivating plot!'\nReply only Positive or Negative."
subset = [elem for elem in examples if elem]
if not subset:
prompt = main_prompt
else:
prefix = "Here are some examples of movie reviews and classification of whether they were Positive or Negative:\n"
prompt = prefix + " \n".join(subset) + "\n " + main_prompt
return prompt
input_examples = [
"'The movie was ok, the actors weren't great' Negative",
"'I loved it, it was an amazing story!' Positive",
"'Total waste of time!!' Negative",
"'Won't recommend' Negative",
]
prompt = prompt_fn(*input_examples)
model_name = "meta-llama/Llama-3.2-1B-Instruct"
tok = transformers.AutoTokenizer.from_pretrained(model_name)
fmt_prompt = tok.apply_chat_template([{"role": "user", "content": prompt}], tokenize=False, add_generation_prompt=True)
inseq_model = inseq.load_model(model_name, "saliency")
# Absolute-valued contributions
out = inseq_model.attribute(fmt_prompt, generation_args={"max_new_tokens": 5}, clean_special_chars=True)
out.aggregate("subwords", special_chars=("\n\n", "\n", " \n")).show()
# Signed contributions
out = inseq_model.attribute(fmt_prompt, generation_args={"max_new_tokens": 5}, clean_special_chars=True, abs=False)
out.aggregate("subwords", special_chars=("\n\n", "\n", " \n")).aggregate("sum").show(do_aggregation=False)
In this example, the SubwordAggregator
takes care of splitting examples using end-of-line characters to infer spans. The output looks like this, and is pretty much equivalent to the one produced with Captum (note that the method is different, I included two variants of the call to compute signed and unsigned contributions in the example)
Please let me know if this addresses your issue, so that I can proceed to close this!
Ah I see, so the examples are actually in-context demonstrations, and the scores represent the aggregated contribution for the full example. If you install from main (I fixed LLaMA 3.2 support in #289) you can do something like:
import transformers import inseq # Create prompt: same as in Captum example def prompt_fn(*examples): main_prompt = "Decide if the following movie review enclosed in quotes is Positive or Negative:\n'I really liked the Avengers, it had a captivating plot!'\nReply only Positive or Negative." subset = [elem for elem in examples if elem] if not subset: prompt = main_prompt else: prefix = "Here are some examples of movie reviews and classification of whether they were Positive or Negative:\n" prompt = prefix + " \n".join(subset) + "\n " + main_prompt return prompt input_examples = [ "'The movie was ok, the actors weren't great' Negative", "'I loved it, it was an amazing story!' Positive", "'Total waste of time!!' Negative", "'Won't recommend' Negative", ] prompt = prompt_fn(*input_examples) model_name = "meta-llama/Llama-3.2-1B-Instruct" tok = transformers.AutoTokenizer.from_pretrained(model_name) fmt_prompt = tok.apply_chat_template([{"role": "user", "content": prompt}], tokenize=False, add_generation_prompt=True) inseq_model = inseq.load_model(model_name, "saliency") # Absolute-valued contributions out = inseq_model.attribute(fmt_prompt, generation_args={"max_new_tokens": 5}, clean_special_chars=True) out.aggregate("subwords", special_chars=("\n\n", "\n", " \n")).show() # Signed contributions out = inseq_model.attribute(fmt_prompt, generation_args={"max_new_tokens": 5}, clean_special_chars=True, abs=False) out.aggregate("subwords", special_chars=("\n\n", "\n", " \n")).aggregate("sum").show(do_aggregation=False)
In this example, the
SubwordAggregator
takes care of splitting examples using end-of-line characters to infer spans. The output looks like this, and is pretty much equivalent to the one produced with Captum (note that the method is different, I included two variants of the call to compute signed and unsigned contributions in the example)Please let me know if this addresses your issue, so that I can proceed to close this!
This is the effect I want!ICL is I want to eval! Let me try it out.
Ah I see, so the examples are actually in-context demonstrations, and the scores represent the aggregated contribution for the full example. If you install from main (I fixed LLaMA 3.2 support in #289) you can do something like:
import transformers import inseq # Create prompt: same as in Captum example def prompt_fn(*examples): main_prompt = "Decide if the following movie review enclosed in quotes is Positive or Negative:\n'I really liked the Avengers, it had a captivating plot!'\nReply only Positive or Negative." subset = [elem for elem in examples if elem] if not subset: prompt = main_prompt else: prefix = "Here are some examples of movie reviews and classification of whether they were Positive or Negative:\n" prompt = prefix + " \n".join(subset) + "\n " + main_prompt return prompt input_examples = [ "'The movie was ok, the actors weren't great' Negative", "'I loved it, it was an amazing story!' Positive", "'Total waste of time!!' Negative", "'Won't recommend' Negative", ] prompt = prompt_fn(*input_examples) model_name = "meta-llama/Llama-3.2-1B-Instruct" tok = transformers.AutoTokenizer.from_pretrained(model_name) fmt_prompt = tok.apply_chat_template([{"role": "user", "content": prompt}], tokenize=False, add_generation_prompt=True) inseq_model = inseq.load_model(model_name, "saliency") # Absolute-valued contributions out = inseq_model.attribute(fmt_prompt, generation_args={"max_new_tokens": 5}, clean_special_chars=True) out.aggregate("subwords", special_chars=("\n\n", "\n", " \n")).show() # Signed contributions out = inseq_model.attribute(fmt_prompt, generation_args={"max_new_tokens": 5}, clean_special_chars=True, abs=False) out.aggregate("subwords", special_chars=("\n\n", "\n", " \n")).aggregate("sum").show(do_aggregation=False)
In this example, the
SubwordAggregator
takes care of splitting examples using end-of-line characters to infer spans. The output looks like this, and is pretty much equivalent to the one produced with Captum (note that the method is different, I included two variants of the call to compute signed and unsigned contributions in the example) Please let me know if this addresses your issue, so that I can proceed to close this!This is the effect I want!ICL is I want to eval! Let me try it out.
It works,I install the main branch!Thank you for your timely support!!
Ah I see, so the examples are actually in-context demonstrations, and the scores represent the aggregated contribution for the full example. If you install from main (I fixed LLaMA 3.2 support in #289) you can do something like:
import transformers import inseq # Create prompt: same as in Captum example def prompt_fn(*examples): main_prompt = "Decide if the following movie review enclosed in quotes is Positive or Negative:\n'I really liked the Avengers, it had a captivating plot!'\nReply only Positive or Negative." subset = [elem for elem in examples if elem] if not subset: prompt = main_prompt else: prefix = "Here are some examples of movie reviews and classification of whether they were Positive or Negative:\n" prompt = prefix + " \n".join(subset) + "\n " + main_prompt return prompt input_examples = [ "'The movie was ok, the actors weren't great' Negative", "'I loved it, it was an amazing story!' Positive", "'Total waste of time!!' Negative", "'Won't recommend' Negative", ] prompt = prompt_fn(*input_examples) model_name = "meta-llama/Llama-3.2-1B-Instruct" tok = transformers.AutoTokenizer.from_pretrained(model_name) fmt_prompt = tok.apply_chat_template([{"role": "user", "content": prompt}], tokenize=False, add_generation_prompt=True) inseq_model = inseq.load_model(model_name, "saliency") # Absolute-valued contributions out = inseq_model.attribute(fmt_prompt, generation_args={"max_new_tokens": 5}, clean_special_chars=True) out.aggregate("subwords", special_chars=("\n\n", "\n", " \n")).show() # Signed contributions out = inseq_model.attribute(fmt_prompt, generation_args={"max_new_tokens": 5}, clean_special_chars=True, abs=False) out.aggregate("subwords", special_chars=("\n\n", "\n", " \n")).aggregate("sum").show(do_aggregation=False)
In this example, the
SubwordAggregator
takes care of splitting examples using end-of-line characters to infer spans. The output looks like this, and is pretty much equivalent to the one produced with Captum (note that the method is different, I included two variants of the call to compute signed and unsigned contributions in the example)Please let me know if this addresses your issue, so that I can proceed to close this!
I got a new problem with out.aggregate("subwords", special_chars=("\n\n", " \n","\n ")).show()
.It seems didn't split the prompt
with "\n" " \n" "\n "
,for example"Decide if the following movie review enclosed in quotes is Positive or Negative:\n'I really liked the Avengers, it had a captivating plot!'\nReply only Positive or Negative."
I use the follow code to check the token list:
print("Formatted Prompt:\n", fmt_prompt)
tokens = tok(fmt_prompt, return_tensors='pt')
print("Tokenized Output:\n", tokens)
the output is below
Formatted Prompt:
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024
<|eot_id|><|start_header_id|>user<|end_header_id|>
Here are some examples of movie reviews and classification of whether they were Positive or Negative:
'The movie was ok, the actors weren't great' Negative
'I loved it, it was an amazing story!' Positive
'Total waste of time!!' Negative
'Won't recommend' Negative
Decide if the following movie review enclosed in quotes is Positive or Negative:
'I really liked the Avengers, it had a captivating plot!'
Reply only Positive or Negative.<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Tokenized Output:
{'input_ids': tensor([[128000, 128000, 128006, 9125, 128007, 271, 38766, 1303, 33025,
2696, 25, 6790, 220, 2366, 18, 198, 15724, 2696,
25, 220, 1627, 10263, 220, 2366, 19, 271, 128009,
128006, 882, 128007, 271, 8586, 527, 1063, 10507, 315,
5818, 8544, 323, 24790, 315, 3508, 814, 1051, 45003,
477, 51957, 512, 17773, 383, 5818, 574, 5509, 11,
279, 20142, 15058, 956, 2294, 6, 51957, 720, 42069,
10456, 433, 11, 433, 574, 459, 8056, 3446, 32483,
45003, 720, 17773, 2426, 12571, 315, 892, 3001, 6,
51957, 720, 6, 76936, 956, 7079, 6, 51957, 198,
99981, 422, 279, 2768, 5818, 3477, 44910, 304, 17637,
374, 45003, 477, 51957, 512, 42069, 2216, 15262, 279,
44197, 11, 433, 1047, 264, 86282, 7234, 49827, 21509,
1193, 45003, 477, 51957, 13, 128009, 128006, 78191, 128007,
271]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1]])}
And the "\n" tokenid is 198 by below code :
char = "\n"
token_id = tok.encode(char, add_special_tokens=False)
print(f"Token ID for '{char}':", token_id)
But in the token list, 198 only appeared twice.The location that appears does not correspond to the text location of the prompt. So what can I do to correct the place at belew?
Ah I see, so the examples are actually in-context demonstrations, and the scores represent the aggregated contribution for the full example. If you install from main (I fixed LLaMA 3.2 support in #289) you can do something like:
import transformers import inseq # Create prompt: same as in Captum example def prompt_fn(*examples): main_prompt = "Decide if the following movie review enclosed in quotes is Positive or Negative:\n'I really liked the Avengers, it had a captivating plot!'\nReply only Positive or Negative." subset = [elem for elem in examples if elem] if not subset: prompt = main_prompt else: prefix = "Here are some examples of movie reviews and classification of whether they were Positive or Negative:\n" prompt = prefix + " \n".join(subset) + "\n " + main_prompt return prompt input_examples = [ "'The movie was ok, the actors weren't great' Negative", "'I loved it, it was an amazing story!' Positive", "'Total waste of time!!' Negative", "'Won't recommend' Negative", ] prompt = prompt_fn(*input_examples) model_name = "meta-llama/Llama-3.2-1B-Instruct" tok = transformers.AutoTokenizer.from_pretrained(model_name) fmt_prompt = tok.apply_chat_template([{"role": "user", "content": prompt}], tokenize=False, add_generation_prompt=True) inseq_model = inseq.load_model(model_name, "saliency") # Absolute-valued contributions out = inseq_model.attribute(fmt_prompt, generation_args={"max_new_tokens": 5}, clean_special_chars=True) out.aggregate("subwords", special_chars=("\n\n", "\n", " \n")).show() # Signed contributions out = inseq_model.attribute(fmt_prompt, generation_args={"max_new_tokens": 5}, clean_special_chars=True, abs=False) out.aggregate("subwords", special_chars=("\n\n", "\n", " \n")).aggregate("sum").show(do_aggregation=False)
In this example, the
SubwordAggregator
takes care of splitting examples using end-of-line characters to infer spans. The output looks like this, and is pretty much equivalent to the one produced with Captum (note that the method is different, I included two variants of the call to compute signed and unsigned contributions in the example) Please let me know if this addresses your issue, so that I can proceed to close this!I got a new problem with
out.aggregate("subwords", special_chars=("\n\n", " \n","\n ")).show()
.It seems didn't split theprompt
with"\n" " \n" "\n "
,for example"Decide if the following movie review enclosed in quotes is Positive or Negative:\n'I really liked the Avengers, it had a captivating plot!'\nReply only Positive or Negative."I use the follow code to check the token list:
print("Formatted Prompt:\n", fmt_prompt) tokens = tok(fmt_prompt, return_tensors='pt') print("Tokenized Output:\n", tokens)
the output is below
Formatted Prompt: <|begin_of_text|><|start_header_id|>system<|end_header_id|> Cutting Knowledge Date: December 2023 Today Date: 26 Jul 2024 <|eot_id|><|start_header_id|>user<|end_header_id|> Here are some examples of movie reviews and classification of whether they were Positive or Negative: 'The movie was ok, the actors weren't great' Negative 'I loved it, it was an amazing story!' Positive 'Total waste of time!!' Negative 'Won't recommend' Negative Decide if the following movie review enclosed in quotes is Positive or Negative: 'I really liked the Avengers, it had a captivating plot!' Reply only Positive or Negative.<|eot_id|><|start_header_id|>assistant<|end_header_id|> Tokenized Output: {'input_ids': tensor([[128000, 128000, 128006, 9125, 128007, 271, 38766, 1303, 33025, 2696, 25, 6790, 220, 2366, 18, 198, 15724, 2696, 25, 220, 1627, 10263, 220, 2366, 19, 271, 128009, 128006, 882, 128007, 271, 8586, 527, 1063, 10507, 315, 5818, 8544, 323, 24790, 315, 3508, 814, 1051, 45003, 477, 51957, 512, 17773, 383, 5818, 574, 5509, 11, 279, 20142, 15058, 956, 2294, 6, 51957, 720, 42069, 10456, 433, 11, 433, 574, 459, 8056, 3446, 32483, 45003, 720, 17773, 2426, 12571, 315, 892, 3001, 6, 51957, 720, 6, 76936, 956, 7079, 6, 51957, 198, 99981, 422, 279, 2768, 5818, 3477, 44910, 304, 17637, 374, 45003, 477, 51957, 512, 42069, 2216, 15262, 279, 44197, 11, 433, 1047, 264, 86282, 7234, 49827, 21509, 1193, 45003, 477, 51957, 13, 128009, 128006, 78191, 128007, 271]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}
And the "\n" tokenid is 198 by below code :
char = "\n" token_id = tok.encode(char, add_special_tokens=False) print(f"Token ID for '{char}':", token_id)
But in the token list, 198 only appeared twice.The location that appears does not correspond to the text location of the prompt. So what can I do to correct the place at belew?
I tried the following prompt organization.It worked partially,but still has some problem.
input_examples = [
"Here are some examples of movie reviews and classification of whether they were Positive or Negative:",
"'The movie was ok, the actors weren't great' Negative",
"'I loved it, it was an amazing story!' Positive",
"'Total waste of time!!' Negative",
"'Won't recommend' Negative",
"'I really liked the Avengers, it had a captivating plot!' Positive",
"Decide if the following movie review enclosed in quotes is Positive or Negative:",
"'I really liked the Avengers, it had a captivating plot!'",
"Reply only Positive or Negative."
]
prompt = " \n".join(input_examples)
The problem is blew. .
I use print(out.sequence_attributions[0].source)
and find the "\n" I added at the end of the last sentence was not tokenized by the tokenizer, and the next word after that was '<|eot_id|>'.
Hi @Ningshiqi, sorry for the delay! I just pushed a PR with a new aggregator that should simplify splitting on arbitrary strings: #290. Could you look and let me know if this works for you?
Hi @Ningshiqi, sorry for the delay! I just pushed a PR with a new aggregator that should simplify splitting on arbitrary strings: #290. Could you look and let me know if this works for you?
I will try it soon ! And put the result in this comment! Thank you so much for your nicely work.
Thanks for your nice work! Does Inseq support attributions with example granularity as in Captum's few-shot? Like the demo belew.