marcotcr / checklist

Beyond Accuracy: Behavioral Testing of NLP models with CheckList
MIT License
2.01k stars 204 forks source link

expectation with different labels for each example doesn't work #68

Closed ramji-c closed 3 years ago

ramji-c commented 3 years ago

I am trying to create a MFT with a different label for each example. I use a list of strings for the labels argument, and when I run the test case, I get a 100% failure even if the predictions are correct. If I set a single string in labels and re-run the test, valid predictions are no longer marked as failed. Is this a bug? What is recommended way to use a list of labels.

Minimal reproducible code

import checklist as ck
from checklist.editor import Editor
from checklist.expect import Expect
from checklist.test_types import MFT
from checklist.test_suite import TestSuite
from checklist.pred_wrapper import PredictorWrapper

def dummy_predict(examples):
    data_label_map = {"example1": "1", "example2": "2", "example3": "3"}
    labels = [data_label_map[ex] for ex in examples]
    return labels

if __name__ == "__main__":
    editor = Editor()
    test_data = editor.template("example{idx}",
                                idx=["1", "2", "3"],
                                labels=["1", "2", "3"],
                                meta=True)
    test_suite = MFT(**test_data, capability="Vocabulary")
    test_suite.run(PredictorWrapper.wrap_predict(dummy_predict))
    test_suite.summary()
marcotcr commented 3 years ago

See #49 In your case, if you print(test_data.labels), this is what you get:

[['1', '2', '3'], ['1', '2', '3'], ['1', '2', '3']]

Which is why the expectation fails. As noted in #49, what you want to do here is something like this:

    test_data = editor.template("example{idx}",
                                idx=["1", "2", "3"],
                                meta=True)
    test_suite = MFT(**test_data, labels=['1', '2', '3'], capability="Vocabulary")
    test_suite.run(PredictorWrapper.wrap_predict(dummy_predict))
    test_suite.summary()
ramji-c commented 3 years ago

Ah! I read #49 before but it still turned out to be a gotcha. thanks for the quick clarification!