Load the model and tokenizer

model_name = "path/to/vicuna/model" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name)

Ensure the model returns attentions

model.config.output_attentions = True model.config.return_dict = True

input_text = "Once upon a time" input_ids = tokenizer(input_text, return_tensors='pt').input_ids

outputs = model.generate(input_ids, max_length=50, output_attentions=True, return_dict_in_generate=True)

generated_ids = outputs.sequences attention_weights = outputs.attentions

generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True) print(generated_text)

Print the shape of the attention weights for each layer

for layer_idx, layer_attention in enumerate(attention_weights): print(f"Layer {layer_idx} attention shape: {layer_attention.shape}")

Expected behavior

manyuanbin commented 1 month ago

model_name = 'lmsys/vicuna-7b-v1.1'

manyuanbin commented 1 month ago

expected:

attentions (tuple(jnp.ndarray), optional, returned when output_attentions=True is passed or when config.output_attentions=True) — Tuple of jnp.ndarray (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

ArthurZucker commented 1 month ago

hey! Can you update to a latest version of transformers and provide a proper reproduction script? 🤗

github-actions[bot] commented 3 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface / transformers

the attention output from llama2 generate differs from other llama models #31984

System Info

Who can help?

Information

Tasks

Reproduction

Load the model and tokenizer

Ensure the model returns attentions

Print the shape of the attention weights for each layer

Expected behavior