data_status only keeps the last applicable tested status of each dataset

The way the filters are set up, only the last known 'reason-for-exclusion' is known for each dataset. To me it makes sense to instead keep track of a set of test results. That way it becomes easier to identify which constraints to relax if you would like a larger study.

I'll go ahead and implement it myself either way, so I will just add a PR when it's done. I opened the issue to see if this was considered, if there are good reasons not to do this and/or if there are any additional related features that make sense to add.

openml / benchmark-suites

data_status only keeps the last applicable tested status of each dataset #38