Closed MekkCyber closed 8 months ago
Hello @tomaarsen
Do you have any suggestions about models to implement attention_sinks for ?
Perhaps the very recent Yi models?
i tried to add Yi support, i think the Yi tokenizer is not integrated yet in AutoTokenizer, so to test it i used the code provided for YiTokenizer, with tokenizer.model as a vocab_file. If you have any remark please let me know.
model = AutoModelForCausalLM.from_pretrained(
model_id,
# for efficiency:
device_map="auto",
torch_dtype=torch.float16,
# `attention_sinks`-specific arguments:
attention_sink_size=4,
attention_sink_window_size=252, # <- Low for the sake of faster generation
trust_remote_code=True,
)
model.eval()
tokenizer = YiTokenizer('tokenizer.model')
tokenizer.pad_token_id = tokenizer.eos_token_id
Hello!
Apologies for delaying this for a while. Regarding the tokenizer, I think that is because the AutoTokenizer also requires trust_remote_code=True
, e.g.:
model_id = "01-ai/Yi-6B"
model = AutoModelForCausalLM.from_pretrained(
model_id,
# for efficiency:
device_map="auto",
torch_dtype=torch.float16,
# `attention_sinks`-specific arguments:
attention_sink_size=4,
attention_sink_window_size=252, # <- Low for the sake of faster generation
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
tokenizer.pad_token_id = tokenizer.eos_token_id
And then it should be fine!
I've added some experiments, ran them, and put the results in the README. I also credited you for this addition there!
I noticed that there is no implementation of
mpt_pos_shift_attention_forward
, I know it's not necessary for the code knowing that no changes are made because there is no positional encoding, however, for consistency I think it's better to have it. Feel free to accept this pull request or not :). I will try working on adding other models to the library. Thank you for your time.