Closed YoojuShin closed 1 year ago
Hi, that's right—the entire stack of transformer activations on top of the gist tokens (the prefix) is used for generation. Apologies, the paper is not clear about this.
The NeurIPS camera ready version (which I'll upload soon) will attempt to be clearer.
@jayelm Thanks for the reply and congrat for the acceptance in NeurIPS!
First of all, I really appreciate for your complete codebase for response generation and training.
I have a question about the response generation process in detail. Does Gisting generate responses using only the last hidden layer of each gist token? It seems that a whole activations at all layers is handed over to generate response after compression. Can you correct me about this?