ValueError: Attention Sinks does not support Flash Attention in QWen models, please use `use_flash_attn=False` in `AutoModelForCausalLM.from_pretrained`.

Essence9999 commented 8 months ago

Ubuntu 22.04 The demo script is as follows： `import torch from transformers import AutoTokenizer, TextStreamer, GenerationConfig from attention_sinks import AutoModelForCausalLM

model_id = "/home/work/projects/model/Qwen-7B" model = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto", torch_dtype=torch.float16, attention_sink_size=4, attention_sink_window_size=252, trust_remote_code=True, use_flash_attn=False ) model.eval() tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) tokenizer.pad_token_id = tokenizer.eos_token_id

text = "保持身体健康有多种方式" input_ids = tokenizer.encode(text, return_tensors="pt").to(model.device)

streamer = TextStreamer(tokenizer) generation_config=GenerationConfig( use_cache=True, min_new_tokens=100_000, max_new_tokens=1_000_000, penalty_alpha=0.6, top_k=5, pad_token_id=tokenizer.pad_token_id, eos_token_id=tokenizer.eos_token_id, ) generated_tokens = model.generate( input_ids, generation_config, streamer=streamer, )

output_text = tokenizer.decode(generated_tokens[0], skip_special_tokens=True)`

Even though I added use_flash_attn=False in AutoModelForCausalLM.from_pretrained，still get this error： ValueError: Attention Sinks does not support Flash Attention in QWen models, please use use_flash_attn=False in AutoModelForCausalLM.from_pretrained.

pip list：

tomaarsen commented 8 months ago

Hello!

I can indeed reproduce this, it seems related to this commit: https://huggingface.co/Qwen/Qwen-7B/commit/f7bc352f27bb1c02ee371a4576942a7d96c8bb97 Which changed the modeling files for QWen. I think you'll have more success if you set revision=... and pick an older commit, e.g. "d135dce78bceb52e63e0d2cec4b9b38952fc2cd6".

Tom Aarsen

tomaarsen commented 8 months ago

I've resolved it in #33, and packaged it in a new release: 0.4.0. You can install it like normal, e.g. pip install -U attention_sinks.

Hope this helps.

Tom Aarsen

Essence9999 commented 8 months ago

I've resolved it in #33, and packaged it in a new release: 0.4.0. You can install it like normal, e.g. pip install -U attention_sinks.

Hope this helps.

Tom Aarsen

Thank you very much for your reply. After the upgrade, the problem was solved.

tomaarsen commented 8 months ago

I'm glad to hear it!

tomaarsen / attention_sinks

ValueError: Attention Sinks does not support Flash Attention in QWen models, please use `use_flash_attn=False` in `AutoModelForCausalLM.from_pretrained`. #32