Closed izaskr closed 2 years ago
Hi - thanks for the reproducible code! You indeed are using the library correctly. If we are to assume minicons does not have a bug, then this might be expected behavior. I checked by replacing the "die" before "unterschiedlichsten" with "der" (which I am assuming is the wrong gender, thereby making the sentence not so grammatical) and we see the scores become worse:
wrong_log_probs = gpt_model_scorer.token_score(["Der Mensch sammelt der unterschiedlichsten Gegenstände."])
wrong_log_probs
# [[('Der', 0.0),
# ('Mensch', -101.90458679199219),
# ('sammelt', -101.98616790771484),
# ('der', -100.152099609375),
# ('unterschiedlichsten', -102.6846694946289),
# ('Gegenstände', -92.48149871826172),
# ('.', -91.42720031738281)]]
which leads me to conclude it might just be a model thing.
I am going to run more checks when I have the bandwidth but let me know if this makes sense! Thanks for using minicons :)
Hi, Thank you for replying so quickly.
Indeed "der" would be incorrect. But what is not expected is that the English sentence has higher log-probabilities, right?
Yes, that is totally not expected and is indeed very surprising. Was this model fine-tuned from an english corpus?
You can also do a sanity check by checking the ranks of each token (ranked based on log-probs):
e.g., the 'die' stimuli has much favorable ranks of subsequent words than does the 'der' stimuli:
gpt_model_scorer.token_score(["Der Mensch sammelt die unterschiedlichsten Gegenstände."], rank=True)
'''OUTPUT:
[[('Der', 0.0, 0),
('Mensch', -104.85952758789062, 25),
('sammelt', -104.52547454833984, 759),
('die', -98.28511047363281, 2),
('unterschiedlichsten', -101.32772827148438, 98),
('Gegenstände', -91.16206359863281, 9),
('.', -91.17868041992188, 4)]]
'''
gpt_model_scorer.token_score(["Der Mensch sammelt der unterschiedlichsten Gegenstände."], rank=True)
'''OUTPUT:
[[('Der', 0.0, 0),
('Mensch', -104.85952758789062, 25),
('sammelt', -104.52547454833984, 759),
('der', -102.77133178710938, 135),
('unterschiedlichsten', -105.02896881103516, 6482),
('Gegenstände', -94.78482818603516, 27),
('.', -93.73509216308594, 4)]]
'''
I will try to manually check without minicons soon, but cannot guarantee how soon :P
Hi @izaskr -- it seems like the reply to the issue in the model's repo explains the observed behavior? would it be ok if I closed this issue then?
closing for now -- feel free to reopen if you find minicons-specific issues!
Hi,
This issue is a question. I'm using the German GPT-2 model
dbmdz/german-gpt2
to get log-probability and surprisal scores for each token.The log-probabilities are quite low given that that the sentence is a grammatical German sentence. Below my code and a comparison between a German and an English sentence with the same model.
Am I using the your code in the intended way? Is the issue the GPT-2 model?