jmschrei / apricot

apricot implements submodular optimization for the purpose of selecting subsets of massive data sets to train machine learning models quickly. See the documentation page: https://apricot-select.readthedocs.io/en/latest/index.html
MIT License
499 stars 48 forks source link

Bug: CustomSelection Initialize with Subset, func Attribute Update Needed #21

Closed jlevy44 closed 3 years ago

jlevy44 commented 3 years ago

https://github.com/jmschrei/apricot/blob/8b9026508ea788916ae7394aa8ab7b6cd330e594/apricot/functions/custom.py#L176

jlevy44 commented 3 years ago

This should be self.function

jlevy44 commented 3 years ago

Thanks for the amazing package!

jlevy44 commented 3 years ago

I've just temporarily patched on my end using class inheritance and seems to work just fine:

class CustomSelection2(CustomSelection): 
    def _initialize(self, X):
        super(CustomSelection, self)._initialize(X)

        if self.initial_subset is None:
            pass
        elif self.initial_subset.ndim == 2:
            if self.initial_subset.shape[1] != X.shape[1]:
                raise ValueError("The number of columns in the initial subset must " \
                    "match the number of columns in X.")
        elif self.initial_subset.ndim == 1:
            self.initial_subset = X[self.initial_subset]
        else:
            raise ValueError("The initial subset must be either a two dimensional" \
                " matrix of examples or a one dimensional mask.")

        if self.initial_subset is None:
            self.total_gain = 0
        else:
            self.total_gain = self.function(self.initial_subset)
jlevy44 commented 3 years ago

Happy to PR, I just figured it's a quick fix and is probably best done on your end. Thanks!

jmschrei commented 3 years ago

Thanks for the report. This should be updated in 0.6.1, which I just uploaded.