jayelm / gisting

Learning to Compress Prompts with Gist Tokens - https://arxiv.org/abs/2304.08467
Apache License 2.0
268 stars 24 forks source link

About partial compression of instruction #15

Closed YoojuShin closed 1 year ago

YoojuShin commented 1 year ago

First of all, I really appreciate for your complete codebase for response generation and training.

I have a question about the response generation process in detail. Does Gisting generate responses using only the last hidden layer of each gist token? It seems that a whole activations at all layers is handed over to generate response after compression. Can you correct me about this?

jayelm commented 1 year ago

Hi, that's right—the entire stack of transformer activations on top of the gist tokens (the prefix) is used for generation. Apologies, the paper is not clear about this.

The NeurIPS camera ready version (which I'll upload soon) will attempt to be clearer.

YoojuShin commented 1 year ago

@jayelm Thanks for the reply and congrat for the acceptance in NeurIPS!