Closed Essence9999 closed 8 months ago
Hello!
I can indeed reproduce this, it seems related to this commit: https://huggingface.co/Qwen/Qwen-7B/commit/f7bc352f27bb1c02ee371a4576942a7d96c8bb97
Which changed the modeling files for QWen. I think you'll have more success if you set revision=...
and pick an older commit, e.g. "d135dce78bceb52e63e0d2cec4b9b38952fc2cd6".
I've resolved it in #33, and packaged it in a new release: 0.4.0. You can install it like normal, e.g. pip install -U attention_sinks
.
Hope this helps.
I've resolved it in #33, and packaged it in a new release: 0.4.0. You can install it like normal, e.g.
pip install -U attention_sinks
.Hope this helps.
- Tom Aarsen
Thank you very much for your reply. After the upgrade, the problem was solved.
I'm glad to hear it!
Ubuntu 22.04 The demo script is as follows: `import torch from transformers import AutoTokenizer, TextStreamer, GenerationConfig from attention_sinks import AutoModelForCausalLM
model_id = "/home/work/projects/model/Qwen-7B" model = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto", torch_dtype=torch.float16, attention_sink_size=4, attention_sink_window_size=252, trust_remote_code=True, use_flash_attn=False ) model.eval() tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) tokenizer.pad_token_id = tokenizer.eos_token_id
text = "保持身体健康有多种方式" input_ids = tokenizer.encode(text, return_tensors="pt").to(model.device)
streamer = TextStreamer(tokenizer) generation_config=GenerationConfig( use_cache=True, min_new_tokens=100_000, max_new_tokens=1_000_000, penalty_alpha=0.6, top_k=5, pad_token_id=tokenizer.pad_token_id, eos_token_id=tokenizer.eos_token_id, ) generated_tokens = model.generate( input_ids, generation_config, streamer=streamer, )
output_text = tokenizer.decode(generated_tokens[0], skip_special_tokens=True)`
Even though I added
use_flash_attn=False
inAutoModelForCausalLM.from_pretrained
,still get this error: ValueError: Attention Sinks does not support Flash Attention in QWen models, please useuse_flash_attn=False
inAutoModelForCausalLM.from_pretrained
.pip list: