marcotcr / checklist

Beyond Accuracy: Behavioral Testing of NLP models with CheckList
MIT License
2.01k stars 204 forks source link

[bug] editor.hypernyms returns a single suggestion, multiple expected #74

Closed jlema closed 3 years ago

jlema commented 3 years ago

As per the generating data notebook - editor.hypernyms is supposed to return multiple suggestions as follows:

editor.hypernyms('My dog eats other animals.', 'dog')
['animal', 'creature', 'food', 'beast', 'meat', 'mammal', 'person', 'organism']

Currently it is returning a single suggestion:

editor.hypernyms('My dog eats other animals.', 'dog')
['animal']

This also affects related_words: Expected:

editor.related_words('My dog eats other animals.', 'dog')[:5]
['cat', 'animal', 'monkey', 'pet', 'elephant']

Current behavior:

editor.related_words('My dog eats other animals.', 'dog')[:5]
['pet']
marcotcr commented 3 years ago

The behavior you observe is due to the likelihood threshold (default is 5). I guess RoBERTa doesn't like 'My food eats other animals', which is probably a good thing. Examples:

editor.hypernyms('My dog eats other animals.', 'dog', threshold=10)

['animal', 'food', 'meat', 'soul', 'beast', 'creature', 'person', 'organism', 'sausage', 'unit', 'being', 'male']

editor.related_words('I have a pet dog.', 'dog')

['bird', 'kitten', 'puppy', 'toy']