jayelm / gisting

Learning to Compress Prompts with Gist Tokens - https://arxiv.org/abs/2304.08467
Apache License 2.0
262 stars 23 forks source link

Any plans to create some sort of gisting framework for arbitrary models? #12

Closed FFFiend closed 1 year ago

FFFiend commented 1 year ago

Basically title, or excuse my ignorance if this is not possible. Right now it seems like only FLAN-T5 and LLama-7b are supported.

jayelm commented 1 year ago

In principle you should be able to apply gist masking to basically any decoder or encoder-decoder Transformer language model (see the paper for details), but you're right that this will require training your own model, and in most cases, modifying the implementation of the LM to support custom attention masks.

You can take a look at the implementations of gist_llama and gist_t5 to see how such masking is implemented, and diff my implementation with the reference Transformers implementation, but right now unfortunately I don't have plans to support other LMs. Sorry!

On Sat, Jul 22, 2023 at 10:15 PM Owais Zahid @.***> wrote:

Basically title, or excuse my ignorance if this is not possible. Right now it seems like only FLAN-T5 and LLama-7b are supported.

— Reply to this email directly, view it on GitHub https://github.com/jayelm/gisting/issues/12, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPUV33GORUZPIOGZPP3QX3XRSXORANCNFSM6AAAAAA2UI3B2Y . You are receiving this because you are subscribed to this thread.Message ID: @.***>

FFFiend commented 1 year ago

Thanks Jesse 😄