As of today, we have only very few constraints for the data in our database. A data constraint is an "assertion" over the data, e.g. the process metrics of a refactoring have to be higher or equal for later refactorings on the same file.
We do simple sanity checks in the Integration tests, especially the toy-projects, but the stress tests (#146 95) and canary tests showed that we missed many (edge) cases.
I think checking the constraints is something we can do in the ML pipeline. For example, whenever we apply a transformation, we make sure the dataset is still as we want it to be!
As of today, we have only very few constraints for the data in our database. A data constraint is an "assertion" over the data, e.g. the process metrics of a refactoring have to be higher or equal for later refactorings on the same file. We do simple sanity checks in the Integration tests, especially the toy-projects, but the stress tests (#146 95) and canary tests showed that we missed many (edge) cases.
Advantages:
For more inspiration look here: https://fontysblogt.nl/testing-machine-learning-applications/