Re-implementation of the Patchscopes paper (arXiv:2401.06102, official implementation).
Patchscopes is a tool for inspecting hidden representations of transformer models.
For decoding what is encoded in a given representation (for instance, the CEO
token in the figure under) activations from a source prompt (Amazon's forer CEO attended Oscars
) are patched in a target prompt(cat->cat; 135->135; hello->hello; ?
). The target prompt is designed with few-shot examples of token repetitions to encourage decoding the token identity given a hidden representation.
See attribute_extraction.ipynb
for a demo on how to run the code.
Source prompt: Amazon's former CEO attended Oscars
\
Source token: CEO
\
Target prompt: cat->cat; 135->135; hello->hello; ?->
Language Models Implement Simple Word2Vec-style Vector Arithmetic study the mechanisms through which LLMs recall information (see Figure 7).
Abstractive tasks involve recalling a token that does not appear in the context, such as in:
A: Mogadishu
Q: What is the capital of Poland?
A:
The authors provide evidence that transformers recall tokens that do not appear in the context through 3 stages of processing:
Here on gpt2-medium:
Extractive tasks, on the contrary, require finding a token that does appear in context, such as in:
The capital of Poland is Warsaw.
Q: What is the capital of Somalia?
A: Mogadishu
Q: What is the capital of Poland?
A:
The extractive tasks immediately saturates.
Here on gpt2-medium: