marcotcr / checklist

Beyond Accuracy: Behavioral Testing of NLP models with CheckList
MIT License
2.01k stars 204 forks source link

Most test cases of reducing test_type discarded #80

Closed iamanigeeit closed 3 years ago

iamanigeeit commented 3 years ago

For the SST suite, Test type Reducers: 1966/2000 are discarded Test type "used to" should reduce: 4202/4268 are discarded

The expect method returns None for these test cases. How can i fix this?

marcotcr commented 3 years ago

This test has the following description:

A model should not be more confident on "I used to think X" when compared to "X", e.g. "I used to love this airline" should have less confidence than "I love this airline

Unfortunately, this can only fail if there is a margin for the model to be 'more confident' with 'I used to think X', i.e. if f(X) is not very close to 100% confidence. Thus, this test doesn't really work for really confident models, which is probably your case here. There is no way to fix it : )