Closed dengemann closed 8 years ago
I wasn't aware of this prerequisite. Can you make an example where the scorer needs the classifier?
In http://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics, I only see scorers that requires y_true and y_pred, or alike, but I guess I'm missing something.
Are you referring to cross_validation.cross_val_score(estimator, X)
? In our case, the cross val is already implemented within the GAT.
@kingjr sklearn.metrics.SCORERS['roc_auc'] returns a function that would take the clf as input too. I'm just saying we would need this kind of handling inside the gat. Ultimately I would like to simply pass socring='roc_auc' as I do in cross_val_score.
Right... I need to think about this. But in the meantime, you can just use scorer=sklearn.metrics.roc_auc_score
no?
Jumping on this with a related question - if we use roc_auc_score, then doesn't that require that the model we've fit use predict_proba
rather than just predict
? I looked through the GAT code briefly and couldn't find a flag that said something like "if clf.probability is True: do predict_proba
instead of predict
.
the new scoring API in sklearn does this magic for you.
Yes, we decided to keep the gat api simple for now. What I do to get proba is create a class that inherits from SVC and change the predict behavior into a predict_proba one
On Thursday, 2 July 2015, Chris Holdgraf notifications@github.com wrote:
Jumping on this with a related question - if we use roc_auc_score, then doesn't that require that the model we've fit use predict_proba rather than just predict? I looked through the GAT code briefly and couldn't find a flag that said something like "if clf.probability is True: do predict_proba instead of predict.
— Reply to this email directly or view it on GitHub https://github.com/mne-tools/mne-python/issues/2176#issuecomment-118086019 .
ah ok - I wonder if one could just do
svm.predict = svm.predict_proba
and then just go from there
On Thu, Jul 2, 2015 at 2:18 PM, J-R King notifications@github.com wrote:
Yes, we decided to keep the gat api simple for now. What I do to get proba is create a class that inherits from SVC and change the predict behavior into a predict_proba one
On Thursday, 2 July 2015, Chris Holdgraf notifications@github.com wrote:
Jumping on this with a related question - if we use roc_auc_score, then doesn't that require that the model we've fit use predict_proba rather than just predict? I looked through the GAT code briefly and couldn't find a flag that said something like "if clf.probability is True: do predict_proba instead of predict.
— Reply to this email directly or view it on GitHub < https://github.com/mne-tools/mne-python/issues/2176#issuecomment-118086019
.
— Reply to this email directly or view it on GitHub https://github.com/mne-tools/mne-python/issues/2176#issuecomment-118138808 .
Ah I can see why you guys wanted to keep the api simple for now...I was
running into "incorrect shape" errors so it seems like the simplest way to
get the code working is to set the predict_proba
function to only output
the second column of its output probabilities (assuming that n_classes is
2).
@agramfort are you talking about the .17 release then? Or is this something that's already implemented?
On Thu, Jul 2, 2015 at 3:33 PM, Chris Holdgraf choldgraf@gmail.com wrote:
ah ok - I wonder if one could just do
svm.predict = svm.predict_proba
and then just go from there
On Thu, Jul 2, 2015 at 2:18 PM, J-R King notifications@github.com wrote:
Yes, we decided to keep the gat api simple for now. What I do to get proba is create a class that inherits from SVC and change the predict behavior into a predict_proba one
On Thursday, 2 July 2015, Chris Holdgraf notifications@github.com wrote:
Jumping on this with a related question - if we use roc_auc_score, then doesn't that require that the model we've fit use predict_proba rather than just predict? I looked through the GAT code briefly and couldn't find a flag that said something like "if clf.probability is True: do predict_proba instead of predict.
— Reply to this email directly or view it on GitHub < https://github.com/mne-tools/mne-python/issues/2176#issuecomment-118086019
.
— Reply to this email directly or view it on GitHub https://github.com/mne-tools/mne-python/issues/2176#issuecomment-118138808 .
That s right. And predict proba is not the only outputting function out there.... I sometimes use multidimensional prediction but you have to build a scorer that knows what to do with it.
On Thursday, 2 July 2015, Chris Holdgraf notifications@github.com wrote:
Ah I can see why you guys wanted to keep the api simple for now...I was running into "incorrect shape" errors so it seems like the simplest way to get the code working is to set the
predict_proba
function to only output the second column of its output probabilities (assuming that n_classes is 2).@agramfort are you talking about the .17 release then? Or is this something that's already implemented?
On Thu, Jul 2, 2015 at 3:33 PM, Chris Holdgraf <choldgraf@gmail.com javascript:_e(%7B%7D,'cvml','choldgraf@gmail.com');> wrote:
ah ok - I wonder if one could just do
svm.predict = svm.predict_proba
and then just go from there
On Thu, Jul 2, 2015 at 2:18 PM, J-R King <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:
Yes, we decided to keep the gat api simple for now. What I do to get proba is create a class that inherits from SVC and change the predict behavior into a predict_proba one
On Thursday, 2 July 2015, Chris Holdgraf <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:
Jumping on this with a related question - if we use roc_auc_score, then doesn't that require that the model we've fit use predict_proba rather than just predict? I looked through the GAT code briefly and couldn't find a flag that said something like "if clf.probability is True: do predict_proba instead of predict.
— Reply to this email directly or view it on GitHub <
https://github.com/mne-tools/mne-python/issues/2176#issuecomment-118086019
.
— Reply to this email directly or view it on GitHub < https://github.com/mne-tools/mne-python/issues/2176#issuecomment-118138808
.
— Reply to this email directly or view it on GitHub https://github.com/mne-tools/mne-python/issues/2176#issuecomment-118166224 .
Cool - maybe I will write a post/example on getting this to work with AUC for future reference, thanks for your clarification!
Chris
On Thu, Jul 2, 2015 at 5:45 PM, J-R King notifications@github.com wrote:
That s right. And predict proba is not the only outputting function out there.... I sometimes use multidimensional prediction but you have to build a scorer that knows what to do with it.
On Thursday, 2 July 2015, Chris Holdgraf notifications@github.com wrote:
Ah I can see why you guys wanted to keep the api simple for now...I was running into "incorrect shape" errors so it seems like the simplest way to get the code working is to set the
predict_proba
function to only output the second column of its output probabilities (assuming that n_classes is 2).@agramfort are you talking about the .17 release then? Or is this something that's already implemented?
On Thu, Jul 2, 2015 at 3:33 PM, Chris Holdgraf <choldgraf@gmail.com javascript:_e(%7B%7D,'cvml','choldgraf@gmail.com');> wrote:
ah ok - I wonder if one could just do
svm.predict = svm.predict_proba
and then just go from there
On Thu, Jul 2, 2015 at 2:18 PM, J-R King <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:
Yes, we decided to keep the gat api simple for now. What I do to get proba is create a class that inherits from SVC and change the predict behavior into a predict_proba one
On Thursday, 2 July 2015, Chris Holdgraf <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:
Jumping on this with a related question - if we use roc_auc_score, then doesn't that require that the model we've fit use predict_proba rather than just predict? I looked through the GAT code briefly and couldn't find a flag that said something like "if clf.probability is True: do predict_proba instead of predict.
— Reply to this email directly or view it on GitHub <
https://github.com/mne-tools/mne-python/issues/2176#issuecomment-118086019
.
— Reply to this email directly or view it on GitHub <
https://github.com/mne-tools/mne-python/issues/2176#issuecomment-118138808
.
— Reply to this email directly or view it on GitHub < https://github.com/mne-tools/mne-python/issues/2176#issuecomment-118166224
.
— Reply to this email directly or view it on GitHub https://github.com/mne-tools/mne-python/issues/2176#issuecomment-118186896 .
@choldgraf that would be really useful. I just had to deal with this myself. I am calculating both the scores from crossval and I wanted to have the AUCs too. @kingjr, for clarification, are you saying that you can create a scorer that output both the classification score and the AUC?
Cool, maybe I can hash one out as I'm visiting Paris next week. I think it would be particularly useful for AUC, since the paper that "introduced" the time-generalization method explicitly recommended AUC as a metric they used (current debate about usefulness of AUC notwithstanding)
we should start tagging issues and PRs for the sprint.
Please go for it anyone.
I ve got some working examples already , I ll put a ref here
On Friday, 3 July 2015, Chris Holdgraf notifications@github.com wrote:
Cool, maybe I can hash one out as I'm visiting Paris next week. I think it would be particularly useful for AUC, since the paper that "introduced" the time-generalization method explicitly recommended AUC as a metric they used (current debate about usefulness of AUC notwithstanding)
— Reply to this email directly or view it on GitHub https://github.com/mne-tools/mne-python/issues/2176#issuecomment-118393682 .
Here's an example:
import mne
from mne.datasets import spm_face
from mne.decoding import GeneralizationAcrossTime
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
# Preprocess data
data_path = spm_face.data_path()
# Load and filter data, set up epochs
raw_fname = data_path + '/MEG/spm/SPM_CTF_MEG_example_faces%d_3D_raw.fif'
events_fname = data_path + '/MEG/sample/sample_audvis_filt-0-40_raw-eve.fif'
raw = mne.io.Raw(raw_fname % 1, preload=True) # Take first run
picks = mne.pick_types(raw.info, meg=True, exclude='bads')
raw.filter(1, 45, method='iir')
events = mne.find_events(raw, stim_channel='UPPT001')
event_id = {"faces": 1, "scrambled": 2}
tmin, tmax = -0.1, 0.5
decim = 4 # decimate to make the example faster to run
epochs = mne.Epochs(raw, events, event_id, tmin, tmax, proj=True,
picks=picks, baseline=None, preload=True,
reject=dict(mag=1.5e-12), decim=decim, verbose=False)
class clf_proba(LogisticRegression):
def predict(self, X):
self.y_pred_ = super(clf_proba, self).predict_proba(X)
return self.y_pred_
def scorer_auc(y_true, y_pred):
from sklearn.metrics import roc_auc_score
from sklearn.preprocessing import LabelBinarizer
le = LabelBinarizer()
y_true = le.fit_transform(y_true)
return roc_auc_score(y_true, y_pred[:, 1])
clf = Pipeline([('scaler', StandardScaler()), ('clf', clf_proba())])
gat = GeneralizationAcrossTime(clf=clf, n_jobs=2, scorer=scorer_auc)
gat.fit(epochs)
gat.score(epochs)
gat.plot()
"current debate about usefulness of AUC notwithstanding"
Could you give a ref? Do you have alternative proposals we should use by default?
EDIT: This example is now outdated. Use GeneralizationAcrossTime(predict_method='predict_proba' instead)
@kingjr, for clarification, are you saying that you can create a scorer that output both the classification score and the AUC?
You could if you wanted to. The dimensionality of the scorer isn't restricted if I recall correctly. However, I think the easiest is to run the scorer twice, without passing the epochs.
Using the example above you'd have something like this:
gat = GeneralizationAcrossTime(clf=clf, n_jobs=2, scorer=scorer_auc)
gat.fit(epochs)
gat.predict(epochs)
gat.scorer_ = scorer_auc
scores_auc = gat.score()
gat.scorer_ = accuracy
scores_acc = gat.score()
Could you give a ref? Do you have alternative proposals we should use by default?
Seems like the wikipedia entry [1] has some good refs...here's the relevant paragraph:
The machine learning community most often uses the ROC AUC statistic for model comparison.[10] However, this practice has recently been questioned based upon new machine learning research that shows that the AUC is quite noisy as a classification measure[11] and has some other significant problems in model comparison.[12][13] A reliable and valid AUC estimate can be interpreted as the probability that the classifier will assign a higher score to a randomly chosen positive example than to a randomly chosen negative example. However, the critical research[11][12] suggests frequent failures in obtaining reliable and valid AUC estimates. Thus, the practical value of the AUC measure has been called into question,[13] raising the possibility that the AUC may actually introduce more uncertainty into machine learning classification accuracy comparisons than resolution. Nonetheless, the coherence of AUC as a measure of aggregated classification performance has been vindicated, in terms of a uniform rate distribution,[14] and AUC has been linked to a number of other performance metrics such as the Brier score.[15]
I don't have the statistical training to really know the ins and outs of the debate, but it seems like there is a debate anyway (I suppose there always is for anything in academia).
Thanks for the example, btw...very useful!
[1] https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_curve
Also per your example, couldn't you even implement a convenience mixin in case you wanted to try a different classifier, e.g.:
class PredictMixin(object):
def predict(self, X):
self.y_pred_ = super(clf_proba, self).predict_proba(X)
return self.y_pred_
class myRF(PredictMixin, RandomForestClassifier):
pass
class mySVM(PredictMixin, LinearSVC):
pass
or is that just super, super lazy? :)
Seems like the wikipedia entry [1] has some good refs...here's the relevant paragraph:
Mmmh, I wasn't aware at this at all. My impression so far, is that AUC was much better than accuracy, at least in my datasets and simulations: both in terms of stability and usability (accuracy is just not very good for GAT analyses; although it's true that the AUC is actually more computationnally expensive).
If anyone wants to give some practical feedbacks here that'd be much appreciated.
couldn't you even implement a convenience mixin in case you wanted to try a different classifier
Sure! I didn't know this syntax #StillLearning. In my case, I actually have a done something like this for generic usecases:
class force_predict(object):
def __init__(self, clf, mode='predict_proba', axis=0):
self._mode = mode
self._axis = axis
self._clf = clf
def fit(self, X, y, **kwargs):
self._clf.fit(X, y, **kwargs)
self._copyattr()
def predict(self, X):
if self._mode == 'predict_proba':
return self._clf.predict_proba(X)[:, self._axis]
elif self._mode == 'decision_function':
distances = self._clf.decision_function(X)
if len(distances.shape) > 1:
return distances[:, self._axis]
else:
return distances
else:
return self._clf.predict(X)
def get_params(self, deep=True):
return dict(clf=self._clf, mode=self._mode, axis=self._axis)
def _copyattr(self):
for key, value in self._clf.__dict__.iteritems():
self.__setattr__(key, value)
ah that's a cool way of going about it!
For the AUC stuff, I haven't done extensive testing but generally have found similar things to yourself. I think that the main layperson's argument about the drawbacks of AUC is that a benefit of ROC is it's ability to tell you about how the TP/FP ratio changes as a function of your decision cutoff. However, by reducing that information to a single point estimate (AUC) you remove this information and make it about as valuable as accuracy. I have no way of confirming this one way or the other though :)
. However, by reducing that information to a single point estimate (AUC) you remove this information and make it about as valuable as accuracy.
I hope it's more than this, because the main point is that AUC is 1) non parametric and 2) robust to non-balanced datasets and biases, whereas accuracy isn't necessarily. The classic cases is when you train A versus B, and test on C versus D. If, par malheur, C and D really looks like A, they will be discretely predict as A, and accuracy drops to chance, whereas the AUC will be resistant to such bias. But I guess other metrics have this advantage too.
Another intereting point is that AUC is ~ to the U of the Mann-Whtiney U test, the established non param 2 sample test, so that's a strong bridge; it's not just about TP/FP rates.
this sounds reasonable to me - I'm just playing devil's (uninformed) advocate :D
And you should, if AUC turned out not to be great, we should know it ASAP before it's too late.
@kingjr I am hoping to solve this issue. if I understand this correctly our main aim is to integrate as much sklearn
into gat
as possible including the make_scorer
function. However for the time being, keeping ourself limited to this PR we have to make roc_auc_score
work properly without having users to write the custom functions.
Please let me know if I am understanding correctly.
Thank you
Some parts of the discussion is outdated. The GeneralizationAcrossTime object can use different predict methods (e.g. predict_method='predict_proba'
).
However, the scoring necessarily takes a callable; it should be able to take a string and evaluates it with sklearn (@dengemann suggested sklearn.metrics.SCORERS['roc_auc']
).
It seems the scoring API is a bit broken, the code does not allow for scorers that require the classifier as inputs. But this cannot even be fixed using partial function as the clf is copied inside. cc @kingjr