jalammar / ecco

Explain, analyze, and visualize NLP language models. Ecco creates interactive visualizations directly in Jupyter notebooks explaining the behavior of Transformer-based language models (like GPT2, BERT, RoBERTA, T5, and T0).
https://ecco.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.97k stars 167 forks source link

Help understanding position arg of layer_predictions #27

Closed mapmeld closed 3 years ago

mapmeld commented 3 years ago

In the gpt2 model, I am measuring the distribution of calendar dates.

import ecco
lm = ecco.from_pretrained('gpt2')
output_0 = lm.generate("On January", generate=1, do_sample=False)

I assumed that to read predictions for the next token, I would need either position=0 or position=2 depending on whether it referred to the 0th token of the full string or the generated output. I was surprised to see these return the same tokens and probabilities:

output_0.layer_predictions(position=0, layer=11, topk=5)
output_0.layer_predictions(position=2, layer=11, topk=5)

If I query position=1 then I see 'the' and other tokens which might follow "On " in the original sentence.

output_0.layer_predictions(position=1, layer=11, topk=5)
jalammar commented 3 years ago

position=2 is the correct parameter. position=0 is returning the same result (because to access the hidden state, we subtract by one, so position=0 looks up element "-1", using python array indexing, that would point to the last item in the array, which is the same as position=2).

I'll have it throw an error if position=0 is entered. That's because position 0 is always an input token. The model did not generate anything in that position. So there are no probabilities associated with that position.

Does that make sense?

mapmeld commented 3 years ago

Yes, your explanation and proposed error makes sense to avoid this problem.

jalammar commented 3 years ago

Great!