Break chunks generation in RCA when not enough possible chunks, fixes issue #200

scikit-learn-contrib / metric-learn

Metric learning algorithms in Python

http://contrib.scikit-learn.org/metric-learn/

MIT License

1.4k stars 234 forks source link

Break chunks generation in RCA when not enough possible chunks, fixes issue #200 #254

Closed RobinVogel closed 5 years ago

RobinVogel commented 5 years ago

Fixes #200 See comment https://github.com/scikit-learn-contrib/metric-learn/pull/198#issuecomment-490927459

I count for each set of instances of each label how many chunks it holds, and sum those. If it is lower than the number of requested chunks, I raise an error.

bellet commented 5 years ago

Thanks for getting started on this. The solution looks fine but you need to write some tests to ensure that everything works as intended. You should test that the error is raised when we expect it, but it would also be useful to make sure everything goes through in cases when there is enough points. Generally it is good to test edge cases (e.g., just enough points, one point missing, etc).

bellet commented 5 years ago

Regarding where these tests should be put, I think they could either go to the class Test_RCA in metric_learn_test.py (as RCA_Supervised is currently the only algorithm using chunks), or to a new file test_constraints.py.

I think the second option is better for two reasons:

You can test the behavior of the method chunks directly, which will cover all other algorithms using chunks if we add any in the future
test_constraints.py would be the place where we can add more tests in the future for the methods in constraints.py (currently I think we only test the other methods indirectly through calls to supervised variants of specific algorithms, which is not ideal)

RobinVogel commented 5 years ago

I took some time to get familiar with tests and wrote one for the two untested cases in the chunk generation that you discussed. I tried to model what I wrote on the existing tests, and wrote something relatively general, at the risk of being verbose.

I separated the generation of labels from the tests and wrote it myself, since it needs to satisfy some constraints for edge cases.

I hesitated to group the tests in a class, as in test_sklearn_compat.py, since test_constraints.py is supposed to test the class Constraints.

wdevazelhes commented 5 years ago

This looks great !