marcotcr / checklist

Beyond Accuracy: Behavioral Testing of NLP models with CheckList
MIT License
2.01k stars 204 forks source link

Added test_id param to test types #79

Open ramji-c opened 3 years ago

ramji-c commented 3 years ago

In addition to test_name and description, a unique test identifier is often assigned to quickly identify different test cases. To enable this, I have added a new optional test_id keyword argument to abstract_test as well as MFT, DIR, INV classes. Also, I created a tests directory with unit tests that checks if object creation succeeds with and without this new argument. Please merge if you feel this could be useful.

ramji-c commented 3 years ago

I am actively working on subtle changes to the scripts as part of implementing CheckList at Expedia. I am happy to submit more PR for these features that I am adding. If not, I will just keep them in my fork. Let me know.

marcotcr commented 3 years ago

I thought about having a unique identifier when I first wrote this, and decided at that time that the test name was probably going to be unique enough. Could you elaborate as to why you felt the need for an ID?

Thanks : )

ramji-c commented 3 years ago

Sure. In my implementation, I assign each capability a specific range of test_id values, akin to error codes. See the table below. image This allows users to quickly identify and associate a capability with a test_id and also provide additional differentiating factor, given that test names could become very similar as we create more and more fine grained tests. Moreover, all tests are exported to a database on top of which we are building a web application for different teams to create and run tests. This means that test names are subject to subtle alterations by product and tech teams. Therefore, test_id could serve as the primary key, which would then allow test_name to be modified at any time without affecting the table underneath.

marcotcr commented 3 years ago

Did you by any chance consider using TestSuite? We group tests by capabilities there as well. I guess I didn't think about exporting tests to a database. Let me think about this a bit more, maybe I should change how tests are saved / exported in general

ramji-c commented 3 years ago

Oh yeah, I am using custom test suite - inherited from TestSuite. So, we create one CustomTestSuite per model and within those test suites there are different tests. Like I mentioned, I am actively working on implementation at Expedia and so there will be few more features that I might add along the way. I am happy to submit PR for those in the event that it could be useful.

ramji-c commented 3 years ago

A note on serializing test suites: The current capability to pickle (or dill) the TestSuite is great. However, in addition to saving the TestSuite object, I also export it out as JSON file, which lends itself to a web application more readily. We could always load a TestSuite object in the backend, but having the test case saved as JSON makes it easy to be copied over for other models and provides more responsive UI.