Dropping samples unnecessarily (possible bug)

@Veghit, I assume the logic behind this operation was not to count one example multiple times since rules are not mutually exclusive. However, this line indeed causes improper behavior in case a dataset contains duplicates. Here is the example:

from sklearn.datasets import make_classification
import wittgenstein as lw
import numpy as np

X, y = make_classification(random_state = 2021)
ripper = lw.RIPPER(random_state = 2021)
ripper.fit(X, y)
print(ripper.score(X, y)) # prints 0.89
print(ripper.score(np.vstack([X, X]), np.hstack([y, y]))) # should be 0.89 as well, but returns 0.7

UPD. Having studied the further logic, I also see that the command is redundant. Removing it solves the issue in the above example but does not crash the overall logic since 'predict()' uses a set of unique indices anyways.

imoscovitz / wittgenstein

Dropping samples unnecessarily (possible bug) #15