mne-tools / mne-python

MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python
https://mne.tools
BSD 3-Clause "New" or "Revised" License
2.75k stars 1.33k forks source link

ENH: GAT object should scorer strings as in sklearn.cross_val_score at all. #2176

Closed dengemann closed 8 years ago

dengemann commented 9 years ago

It seems the scoring API is a bit broken, the code does not allow for scorers that require the classifier as inputs. But this cannot even be fixed using partial function as the clf is copied inside. cc @kingjr

kingjr commented 9 years ago

I wasn't aware of this prerequisite. Can you make an example where the scorer needs the classifier?

kingjr commented 9 years ago

In http://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics, I only see scorers that requires y_true and y_pred, or alike, but I guess I'm missing something.

Are you referring to cross_validation.cross_val_score(estimator, X)? In our case, the cross val is already implemented within the GAT.

dengemann commented 9 years ago

@kingjr sklearn.metrics.SCORERS['roc_auc'] returns a function that would take the clf as input too. I'm just saying we would need this kind of handling inside the gat. Ultimately I would like to simply pass socring='roc_auc' as I do in cross_val_score.

kingjr commented 9 years ago

Right... I need to think about this. But in the meantime, you can just use scorer=sklearn.metrics.roc_auc_score no?

choldgraf commented 9 years ago

Jumping on this with a related question - if we use roc_auc_score, then doesn't that require that the model we've fit use predict_proba rather than just predict? I looked through the GAT code briefly and couldn't find a flag that said something like "if clf.probability is True: do predict_proba instead of predict.

agramfort commented 9 years ago

the new scoring API in sklearn does this magic for you.

kingjr commented 9 years ago

Yes, we decided to keep the gat api simple for now. What I do to get proba is create a class that inherits from SVC and change the predict behavior into a predict_proba one

On Thursday, 2 July 2015, Chris Holdgraf notifications@github.com wrote:

Jumping on this with a related question - if we use roc_auc_score, then doesn't that require that the model we've fit use predict_proba rather than just predict? I looked through the GAT code briefly and couldn't find a flag that said something like "if clf.probability is True: do predict_proba instead of predict.

— Reply to this email directly or view it on GitHub https://github.com/mne-tools/mne-python/issues/2176#issuecomment-118086019 .

choldgraf commented 9 years ago

ah ok - I wonder if one could just do

svm.predict = svm.predict_proba

and then just go from there

On Thu, Jul 2, 2015 at 2:18 PM, J-R King notifications@github.com wrote:

Yes, we decided to keep the gat api simple for now. What I do to get proba is create a class that inherits from SVC and change the predict behavior into a predict_proba one

On Thursday, 2 July 2015, Chris Holdgraf notifications@github.com wrote:

Jumping on this with a related question - if we use roc_auc_score, then doesn't that require that the model we've fit use predict_proba rather than just predict? I looked through the GAT code briefly and couldn't find a flag that said something like "if clf.probability is True: do predict_proba instead of predict.

— Reply to this email directly or view it on GitHub < https://github.com/mne-tools/mne-python/issues/2176#issuecomment-118086019

.

— Reply to this email directly or view it on GitHub https://github.com/mne-tools/mne-python/issues/2176#issuecomment-118138808 .

choldgraf commented 9 years ago

Ah I can see why you guys wanted to keep the api simple for now...I was running into "incorrect shape" errors so it seems like the simplest way to get the code working is to set the predict_proba function to only output the second column of its output probabilities (assuming that n_classes is 2).

@agramfort are you talking about the .17 release then? Or is this something that's already implemented?

On Thu, Jul 2, 2015 at 3:33 PM, Chris Holdgraf choldgraf@gmail.com wrote:

ah ok - I wonder if one could just do

svm.predict = svm.predict_proba

and then just go from there

On Thu, Jul 2, 2015 at 2:18 PM, J-R King notifications@github.com wrote:

Yes, we decided to keep the gat api simple for now. What I do to get proba is create a class that inherits from SVC and change the predict behavior into a predict_proba one

On Thursday, 2 July 2015, Chris Holdgraf notifications@github.com wrote:

Jumping on this with a related question - if we use roc_auc_score, then doesn't that require that the model we've fit use predict_proba rather than just predict? I looked through the GAT code briefly and couldn't find a flag that said something like "if clf.probability is True: do predict_proba instead of predict.

— Reply to this email directly or view it on GitHub < https://github.com/mne-tools/mne-python/issues/2176#issuecomment-118086019

.

— Reply to this email directly or view it on GitHub https://github.com/mne-tools/mne-python/issues/2176#issuecomment-118138808 .

kingjr commented 9 years ago

That s right. And predict proba is not the only outputting function out there.... I sometimes use multidimensional prediction but you have to build a scorer that knows what to do with it.

On Thursday, 2 July 2015, Chris Holdgraf notifications@github.com wrote:

Ah I can see why you guys wanted to keep the api simple for now...I was running into "incorrect shape" errors so it seems like the simplest way to get the code working is to set the predict_proba function to only output the second column of its output probabilities (assuming that n_classes is 2).

@agramfort are you talking about the .17 release then? Or is this something that's already implemented?

On Thu, Jul 2, 2015 at 3:33 PM, Chris Holdgraf <choldgraf@gmail.com javascript:_e(%7B%7D,'cvml','choldgraf@gmail.com');> wrote:

ah ok - I wonder if one could just do

svm.predict = svm.predict_proba

and then just go from there

On Thu, Jul 2, 2015 at 2:18 PM, J-R King <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

Yes, we decided to keep the gat api simple for now. What I do to get proba is create a class that inherits from SVC and change the predict behavior into a predict_proba one

On Thursday, 2 July 2015, Chris Holdgraf <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

Jumping on this with a related question - if we use roc_auc_score, then doesn't that require that the model we've fit use predict_proba rather than just predict? I looked through the GAT code briefly and couldn't find a flag that said something like "if clf.probability is True: do predict_proba instead of predict.

— Reply to this email directly or view it on GitHub <

https://github.com/mne-tools/mne-python/issues/2176#issuecomment-118086019

.

— Reply to this email directly or view it on GitHub < https://github.com/mne-tools/mne-python/issues/2176#issuecomment-118138808

.

— Reply to this email directly or view it on GitHub https://github.com/mne-tools/mne-python/issues/2176#issuecomment-118166224 .

choldgraf commented 9 years ago

Cool - maybe I will write a post/example on getting this to work with AUC for future reference, thanks for your clarification!

Chris

On Thu, Jul 2, 2015 at 5:45 PM, J-R King notifications@github.com wrote:

That s right. And predict proba is not the only outputting function out there.... I sometimes use multidimensional prediction but you have to build a scorer that knows what to do with it.

On Thursday, 2 July 2015, Chris Holdgraf notifications@github.com wrote:

Ah I can see why you guys wanted to keep the api simple for now...I was running into "incorrect shape" errors so it seems like the simplest way to get the code working is to set the predict_proba function to only output the second column of its output probabilities (assuming that n_classes is 2).

@agramfort are you talking about the .17 release then? Or is this something that's already implemented?

On Thu, Jul 2, 2015 at 3:33 PM, Chris Holdgraf <choldgraf@gmail.com javascript:_e(%7B%7D,'cvml','choldgraf@gmail.com');> wrote:

ah ok - I wonder if one could just do

svm.predict = svm.predict_proba

and then just go from there

On Thu, Jul 2, 2015 at 2:18 PM, J-R King <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

Yes, we decided to keep the gat api simple for now. What I do to get proba is create a class that inherits from SVC and change the predict behavior into a predict_proba one

On Thursday, 2 July 2015, Chris Holdgraf <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

Jumping on this with a related question - if we use roc_auc_score, then doesn't that require that the model we've fit use predict_proba rather than just predict? I looked through the GAT code briefly and couldn't find a flag that said something like "if clf.probability is True: do predict_proba instead of predict.

— Reply to this email directly or view it on GitHub <

https://github.com/mne-tools/mne-python/issues/2176#issuecomment-118086019

.

— Reply to this email directly or view it on GitHub <

https://github.com/mne-tools/mne-python/issues/2176#issuecomment-118138808

.

— Reply to this email directly or view it on GitHub < https://github.com/mne-tools/mne-python/issues/2176#issuecomment-118166224

.

— Reply to this email directly or view it on GitHub https://github.com/mne-tools/mne-python/issues/2176#issuecomment-118186896 .

teonbrooks commented 9 years ago

@choldgraf that would be really useful. I just had to deal with this myself. I am calculating both the scores from crossval and I wanted to have the AUCs too. @kingjr, for clarification, are you saying that you can create a scorer that output both the classification score and the AUC?

choldgraf commented 9 years ago

Cool, maybe I can hash one out as I'm visiting Paris next week. I think it would be particularly useful for AUC, since the paper that "introduced" the time-generalization method explicitly recommended AUC as a metric they used (current debate about usefulness of AUC notwithstanding)

agramfort commented 9 years ago

we should start tagging issues and PRs for the sprint.

Please go for it anyone.

kingjr commented 9 years ago

I ve got some working examples already , I ll put a ref here

On Friday, 3 July 2015, Chris Holdgraf notifications@github.com wrote:

Cool, maybe I can hash one out as I'm visiting Paris next week. I think it would be particularly useful for AUC, since the paper that "introduced" the time-generalization method explicitly recommended AUC as a metric they used (current debate about usefulness of AUC notwithstanding)

— Reply to this email directly or view it on GitHub https://github.com/mne-tools/mne-python/issues/2176#issuecomment-118393682 .

kingjr commented 9 years ago

Here's an example:

import mne
from mne.datasets import spm_face
from mne.decoding import GeneralizationAcrossTime
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
# Preprocess data
data_path = spm_face.data_path()
# Load and filter data, set up epochs
raw_fname = data_path + '/MEG/spm/SPM_CTF_MEG_example_faces%d_3D_raw.fif'
events_fname = data_path + '/MEG/sample/sample_audvis_filt-0-40_raw-eve.fif'
raw = mne.io.Raw(raw_fname % 1, preload=True)  # Take first run
picks = mne.pick_types(raw.info, meg=True, exclude='bads')
raw.filter(1, 45, method='iir')
events = mne.find_events(raw, stim_channel='UPPT001')
event_id = {"faces": 1, "scrambled": 2}
tmin, tmax = -0.1, 0.5
decim = 4  # decimate to make the example faster to run
epochs = mne.Epochs(raw, events, event_id, tmin, tmax, proj=True,
                    picks=picks, baseline=None, preload=True,
                    reject=dict(mag=1.5e-12), decim=decim, verbose=False)

class clf_proba(LogisticRegression):
    def predict(self, X):
        self.y_pred_ = super(clf_proba, self).predict_proba(X)
        return self.y_pred_

def scorer_auc(y_true, y_pred):
    from sklearn.metrics import roc_auc_score
    from sklearn.preprocessing import LabelBinarizer
    le = LabelBinarizer()
    y_true = le.fit_transform(y_true)
    return roc_auc_score(y_true, y_pred[:, 1])

clf = Pipeline([('scaler', StandardScaler()), ('clf', clf_proba())])

gat = GeneralizationAcrossTime(clf=clf, n_jobs=2, scorer=scorer_auc)
gat.fit(epochs)
gat.score(epochs)
gat.plot()

"current debate about usefulness of AUC notwithstanding"

Could you give a ref? Do you have alternative proposals we should use by default?

EDIT: This example is now outdated. Use GeneralizationAcrossTime(predict_method='predict_proba' instead)

kingjr commented 9 years ago

@kingjr, for clarification, are you saying that you can create a scorer that output both the classification score and the AUC?

You could if you wanted to. The dimensionality of the scorer isn't restricted if I recall correctly. However, I think the easiest is to run the scorer twice, without passing the epochs.

Using the example above you'd have something like this:

gat = GeneralizationAcrossTime(clf=clf, n_jobs=2, scorer=scorer_auc)
gat.fit(epochs)
gat.predict(epochs)

gat.scorer_ = scorer_auc
scores_auc = gat.score()
gat.scorer_ = accuracy
scores_acc = gat.score()
choldgraf commented 9 years ago

Could you give a ref? Do you have alternative proposals we should use by default?

Seems like the wikipedia entry [1] has some good refs...here's the relevant paragraph:

The machine learning community most often uses the ROC AUC statistic for model comparison.[10] However, this practice has recently been questioned based upon new machine learning research that shows that the AUC is quite noisy as a classification measure[11] and has some other significant problems in model comparison.[12][13] A reliable and valid AUC estimate can be interpreted as the probability that the classifier will assign a higher score to a randomly chosen positive example than to a randomly chosen negative example. However, the critical research[11][12] suggests frequent failures in obtaining reliable and valid AUC estimates. Thus, the practical value of the AUC measure has been called into question,[13] raising the possibility that the AUC may actually introduce more uncertainty into machine learning classification accuracy comparisons than resolution. Nonetheless, the coherence of AUC as a measure of aggregated classification performance has been vindicated, in terms of a uniform rate distribution,[14] and AUC has been linked to a number of other performance metrics such as the Brier score.[15]

I don't have the statistical training to really know the ins and outs of the debate, but it seems like there is a debate anyway (I suppose there always is for anything in academia).

Thanks for the example, btw...very useful!

[1] https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_curve

choldgraf commented 9 years ago

Also per your example, couldn't you even implement a convenience mixin in case you wanted to try a different classifier, e.g.:

class PredictMixin(object):
    def predict(self, X):
        self.y_pred_ = super(clf_proba, self).predict_proba(X)
        return self.y_pred_
class myRF(PredictMixin, RandomForestClassifier):
    pass
class mySVM(PredictMixin, LinearSVC):
    pass

or is that just super, super lazy? :)

kingjr commented 9 years ago

Seems like the wikipedia entry [1] has some good refs...here's the relevant paragraph:

Mmmh, I wasn't aware at this at all. My impression so far, is that AUC was much better than accuracy, at least in my datasets and simulations: both in terms of stability and usability (accuracy is just not very good for GAT analyses; although it's true that the AUC is actually more computationnally expensive).

If anyone wants to give some practical feedbacks here that'd be much appreciated.

couldn't you even implement a convenience mixin in case you wanted to try a different classifier

Sure! I didn't know this syntax #StillLearning. In my case, I actually have a done something like this for generic usecases:

class force_predict(object):
    def __init__(self, clf, mode='predict_proba', axis=0):
        self._mode = mode
        self._axis = axis
        self._clf = clf

    def fit(self, X, y, **kwargs):
        self._clf.fit(X, y, **kwargs)
        self._copyattr()

    def predict(self, X):
        if self._mode == 'predict_proba':
            return self._clf.predict_proba(X)[:, self._axis]
        elif self._mode == 'decision_function':
            distances = self._clf.decision_function(X)
            if len(distances.shape) > 1:
                return distances[:, self._axis]
            else:
                return distances
        else:
            return self._clf.predict(X)

    def get_params(self, deep=True):
        return dict(clf=self._clf, mode=self._mode, axis=self._axis)

    def _copyattr(self):
        for key, value in self._clf.__dict__.iteritems():
            self.__setattr__(key, value)
choldgraf commented 9 years ago

ah that's a cool way of going about it!

For the AUC stuff, I haven't done extensive testing but generally have found similar things to yourself. I think that the main layperson's argument about the drawbacks of AUC is that a benefit of ROC is it's ability to tell you about how the TP/FP ratio changes as a function of your decision cutoff. However, by reducing that information to a single point estimate (AUC) you remove this information and make it about as valuable as accuracy. I have no way of confirming this one way or the other though :)

kingjr commented 9 years ago

. However, by reducing that information to a single point estimate (AUC) you remove this information and make it about as valuable as accuracy.

I hope it's more than this, because the main point is that AUC is 1) non parametric and 2) robust to non-balanced datasets and biases, whereas accuracy isn't necessarily. The classic cases is when you train A versus B, and test on C versus D. If, par malheur, C and D really looks like A, they will be discretely predict as A, and accuracy drops to chance, whereas the AUC will be resistant to such bias. But I guess other metrics have this advantage too.

Another intereting point is that AUC is ~ to the U of the Mann-Whtiney U test, the established non param 2 sample test, so that's a strong bridge; it's not just about TP/FP rates.

choldgraf commented 9 years ago

this sounds reasonable to me - I'm just playing devil's (uninformed) advocate :D

kingjr commented 9 years ago

And you should, if AUC turned out not to be great, we should know it ASAP before it's too late.

kaichogami commented 8 years ago

@kingjr I am hoping to solve this issue. if I understand this correctly our main aim is to integrate as much sklearn into gat as possible including the make_scorer function. However for the time being, keeping ourself limited to this PR we have to make roc_auc_score work properly without having users to write the custom functions. Please let me know if I am understanding correctly. Thank you

kingjr commented 8 years ago

Some parts of the discussion is outdated. The GeneralizationAcrossTime object can use different predict methods (e.g. predict_method='predict_proba').

However, the scoring necessarily takes a callable; it should be able to take a string and evaluates it with sklearn (@dengemann suggested sklearn.metrics.SCORERS['roc_auc']).