refactoring-ai / Data-Collection

Collect refactorings with metrics from java source code.
MIT License
6 stars 1 forks source link

Missing Data Constraints #11

Open jan-gerling opened 4 years ago

jan-gerling commented 4 years ago

As of today, we have only very few constraints for the data in our database. A data constraint is an "assertion" over the data, e.g. the process metrics of a refactoring have to be higher or equal for later refactorings on the same file. We do simple sanity checks in the Integration tests, especially the toy-projects, but the stress tests (#146 95) and canary tests showed that we missed many (edge) cases.

Advantages:

  1. confidence in the data

For more inspiration look here: https://fontysblogt.nl/testing-machine-learning-applications/

mauricioaniche commented 4 years ago

I think checking the constraints is something we can do in the ML pipeline. For example, whenever we apply a transformation, we make sure the dataset is still as we want it to be!

I'm adding the label here.