scikit-learn-contrib / lightning

Large-scale linear classification, regression and ranking in Python
https://contrib.scikit-learn.org/lightning/
1.73k stars 214 forks source link

Bug in predict_proba in SAGClassifier #54

Closed casotto closed 8 years ago

casotto commented 8 years ago

Running predict_proba in SAGClassifier method gives the following error

  File "C:\Users\M.casotto\AppData\Local\Continuum\Anaconda2\lib\site-packages\lightning\impl\base.py", line 42, in predict_proba
    if len(self.classes_) != 2:

AttributeError: 'StructuredSparsitySAGA' object has no attribute 'classes_'

The self.classes_ member is defined inside the BaseClassifier method when calling _set_label_transformers that encodes the reponse vector in a vector of 1/-1 (last value by default).

def _set_label_transformers(self, y, reencode=False, neg_label=-1):
    if reencode:
        self.label_encoder_ = LabelEncoder()
        y = self.label_encoder_.fit_transform(y).astype(np.int32)
    else:
        y = y.astype(np.int32)

    self.label_binarizer_ = LabelBinarizer(neg_label=neg_label,
                                           pos_label=1)
    self.label_binarizer_.fit(y)
    self.classes_ = self.label_binarizer_.classes_.astype(np.int32)
    n_classes = len(self.label_binarizer_.classes_)
    n_vectors = 1 if n_classes <= 2 else n_classes

    return y, n_classes, n_vectors

Unfortunately in the inherited class SAGClassifier, when calling the fit method the reponse vector is casted to 1/-1 using LabelBinarizer instead of _set_label_transformers, see for example

class SAGClassifier(BaseClassifier, _BaseSAG):
    def fit(self, X, y):
        if not self.is_saga and self.penalty is not None:
            raise ValueError('Penalties in SAGClassifier. Please use '
                             'SAGAClassifier instead.'
                             '.')
        self.label_binarizer_ = LabelBinarizer(neg_label=-1, pos_label=1)
        Y = np.asfortranarray(self.label_binarizer_.fit_transform(y),
                              dtype=np.float64)
        return self._fit(X, Y)

As I am not able to compile the lightning package I cannot test compilation, but changing the above code with

class SAGClassifier(BaseClassifier, _BaseSAG):
    def fit(self, X, y):
        if not self.is_saga and self.penalty is not None:
            raise ValueError('Penalties in SAGClassifier. Please use '
                             'SAGAClassifier instead.'
                             '.')

        y_binned,___,___ = self._set_label_transformers(y, neg_label=-1)
        Y = np.asfortranarray(y_binned,
                              dtype=np.float64)
        return self._fit(X, Y)

should work fine

fabianp commented 8 years ago

@casotto thanks for the report, I submitted a fix as pull request #55