kanishkamisra / minicons

Utility for behavioral and representational analyses of Language Models
https://minicons.kanishka.website
MIT License
122 stars 29 forks source link

Add `within_word_l2r` pseudo-log-likelihood scoring method for masked language models #31

Closed carina-kauf closed 1 year ago

carina-kauf commented 1 year ago

This PR adds a better scoring method for masked language models (Kauf & Ivanova, 2023) to the MaskedLMScorer class within the scorer module.

Key addition: PLL_metric='within_word_l2r' scoring option

The key addition is a new optional string argument called PLL_metric, which can take one of two values:

The optional PLL_metric string argument was added to the following functions within the MaskedLMScorer class:

Usage

from scorer import MaskedLMScorer
mlm_model = MaskedLMScorer('bert-base-uncased', 'cpu')

stimuli = ['The traveler lost the souvenir.']

print(mlm_model.sequence_score(stimuli, reduction = lambda x: -x.sum(0).item(), PLL_metric='within_word_l2r'))
'''
[32.77983617782593]
'''

print(mlm_model.token_score(stimuli, PLL_metric='within_word_l2r'))
'''
[[('the', -0.07324600219726562), ('traveler', -9.668401718139648), ('lost', -6.955361366271973),
('the', -1.1923179626464844), ('so', -7.776356220245361), ('##uven', -6.989711761474609),
('##ir', -0.037807464599609375), ('.', -0.08663368225097656)]]
'''
netlify[bot] commented 1 year ago

Deploy Preview for pyminicons canceled.

Name Link
Latest commit 5637b27bc178f4224cd8838530c9c47efe07aefe
Latest deploy log https://app.netlify.com/sites/pyminicons/deploys/648869ec7c15cc000876fd54
netlify[bot] commented 1 year ago

Deploy Preview for pyminicons canceled.

Name Link
Latest commit 5637b27bc178f4224cd8838530c9c47efe07aefe
Latest deploy log https://app.netlify.com/sites/pyminicons/deploys/648869ec7c15cc000876fd54
kanishkamisra commented 1 year ago

this is brilliant! Thanks @carina-kauf!!