Can't use output_attentions when using unsloth

unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory

https://unsloth.ai

Apache License 2.0

16.91k stars 1.16k forks source link

Open Decycle opened 4 months ago

Decycle commented 4 months ago

I got the following an assertion error when attempting to run the following code:

with torch.no_grad():
    input_ids = dataset['input_ids'][0]
    output1 = model(input_ids=input_ids, return_dict=True, output_attentions=True)

The error is caused by unsloth/model/llama.py's LlamaModel_fast_forward method.

What should I do to get the attentions output?

danielhanchen commented 4 months ago

I don't think that'll work :( We use FA2 and SDPA so the attention output is actually never constructed

Paoloc99 commented 1 month ago

Hi, I have the same necessity to output the attentions. Did you find a workaround? Thanks

danielhanchen commented 1 month ago

Sorry, it won't work with Unsloth - best to use normal HuggingFace for now sorry