Closed tomaarsen closed 10 months ago
Confirmed working, even tested with a few gptq models! just needed to git+ install
pip install git+https://github.com/tomaarsen/attention_sinks.git
That's awesome! I'm preparing a release now so the install is a bit easier - I'm just doing some edits on the README and CHANGELOG first :)
Thanks for helping with testing!
v0.2.2 has been released, which includes this PR.
Closes #1
Hello!
Pull Request overview
model.generate
Details
The
_update_model_kwargs_for_generation
method inGenerationMixin
would endlessly grow theattention_mask
to match thepast_key_values + 1
, which is normally very reasonable. However, withattention_sinks
we eventually cap thepast_key_values
, so it ended up crashing.This change very simply prevents the endless growth of the
attention_mask
so it always matchespast_key_values + 1
.Usage