Purpose of replace_with_xformers_attention() function

texttron / tevatron

Tevatron - A flexible toolkit for neural retrieval research and development.

http://tevatron.ai

Apache License 2.0

435 stars 87 forks source link

Purpose of replace_with_xformers_attention() function #115

Open cramraj8 opened 2 months ago

cramraj8 commented 2 months ago

Hi @MXueguang ,

I wonder what's the purpose of having replace_with_xformers_attention() defined in the utils.py because I am getting the following error,

AttributeError: 'LlamaAttention' object has no attribute 'num_key_value_heads'

Does the self.num_key_value_heads value in the replace_with_xformers_attention() defined somewhere else ?

MXueguang commented 2 months ago

I was trying to use flashattention with replace_with_xformers_attention(). but with recent transformers, i believe LLaMA can direct use flashattention by specificing atten_implementation when loading the pretrained model. this line is not necessary any more.

cramraj8 commented 2 months ago

Got it. Thank you.