timeseriesAI / tsai

Time series Timeseries Deep Learning Machine Learning Python Pytorch fastai | State-of-the-art Deep Learning library for Time Series and Sequences in Pytorch / fastai
https://timeseriesai.github.io/tsai/
Apache License 2.0
5.19k stars 649 forks source link

Is there any ways of recording the wrong predictions into a txt files? #397

Closed AnthonyFang623 closed 2 years ago

AnthonyFang623 commented 2 years ago

I am wondering if the net could record the wrong predictions of the dataset? So maybe I can find a pattern from the wrong files and adjust my method of preprocessing data.

oguiza commented 2 years ago

Hi @AnthonyFang623, There's a way to do it, although it is not as direct as I'd like it to be. This is a code snippet you can use to try this approach. Let's say you have already trained a model:

X, y, splits = get_UCR_data('LSST', split_data=False)
tfms = [None, TSClassification()]
batch_tfms = TSStandardize(by_sample=True)
dls = get_ts_dls(X, y, splits=splits, tfms=tfms, batch_tfms=batch_tfms)
learn = ts_learner(dls, InceptionTimePlus, metrics=accuracy, cbs=[ShowGraph()])
learn.fit_one_cycle(10, 1e-2)

You can use this code to extract the top loss indices:


interp = Interpretation.from_learner(learn)
valid_top_losses, valid_idxs = interp.top_losses(9)
valid_top_losses, valid_idxs

Bear in mind the valid_idxs are referenced to the validation splits:

highest_loss_input_idxs = splits[1][valid_idxs] 
sel_X, sel_y = X[highest_loss_input_idxs], y[highest_loss_input_idxs]
new_dl = learn.dls.new_dl(sel_X, sel_y)
new_dl.show_batch()
AnthonyFang623 commented 2 years ago

Hi @oguiza , This is really helpful, thank you very much! And here is another question. I use the new version of ROCKET method that you developed, below are my codes

from tsai.all import *
X = np.load('mydata.npy') 
y = np.load('mylabel.npy')
X2d = X[:]
X3d = to3d(X2d)
splits = get_splits(y, valid_size=.2, stratify=True, random_state=23, shuffle=True)
tfms  = [None, [Categorize()]]
batch_tfms = [TSStandardize(by_sample=True)]
dls = get_ts_dls(X3d, y, splits=splits, tfms=tfms, drop_last=False, shuffle_train=False, batch_tfms=batch_tfms, bs=10_000)
model = build_ts_model(ROCKET, dls=dls)
X_train, y_train = create_rocket_features(dls.train, model)
X_valid, y_valid = create_rocket_features(dls.valid, model)

and my data works perfectly with RidgeClassifierCV

from sklearn.linear_model import RidgeClassifierCV
ridge = RidgeClassifierCV(alphas=np.logspace(-8, 8, 17), normalize=True)
ridge.fit(X_train, y_train)
print(f'alpha: {ridge.alpha_:.2E}  train: {ridge.score(X_train, y_train):.5f}  valid: {ridge.score(X_valid, y_valid):.5f}')

but I want to output some results figures like losses, accuracy, confusion matrix, like in the Inceptiontime model, and recall_score, precision_score, f1_score, like in SVM or any other linear classifier. But I did'n find solutions in the tutorials when using RidgeClassifier. Are those ideas possible?

oguiza commented 2 years ago

Hi @AnthonyFang623, No, it's not possible. When you use a sklearn classifier, there's no fastai learner. So you'll need to use sklearn functionality to do what you want to do. sklearn website contains lots of examples.

AnthonyFang623 commented 2 years ago

Hi @oguiza , I see where I get it wrong. Thank you! But I find in tutorial 02, when you chose Fastai classifier head as the classifier, you can output figures like in the Inceptiontime, but I don't understand the code here

def lin_zero_init(layer):
    if isinstance(layer, nn.Linear):
        nn.init.constant_(layer.weight.data, 0.)
        if layer.bias is not None: nn.init.constant_(layer.bias.data, 0.)
model = create_mlp_head(dls.vars, dls.c, dls.len)
model.apply(lin_zero_init)
learn = Learner(dls, model, metrics=accuracy, cbs=ShowGraph())  # find lr curves and  suggest the best

I assume lin_zero_init(layer) is to instantiate a layer with wights and bias?

And aboutmodel = create_mlp_head, what does mlp_head mean? A name of classifier or a method of generating features? And in the python script it calls, I find I could change it to create_fc_head, creat_conv_head or some other options, I did't find the related documents in the project.

vrodriguezf commented 2 years ago

There's a way to do it, although it is not as direct as I'd like it to be

Maybe it's helpful to wrap that code snippet as a plot_top_losses function, kind of what fastai does?

oguiza commented 2 years ago

Hi @vrodriguezf, I agree it may be useful. I'll add it to the list of ideas.

oguiza commented 2 years ago

I believe this should be implemented as plot_best_losses and plot_worst_losses, with the option to select one or multiple classes.

vrodriguezf commented 2 years ago

IIRC, fastai has just one plot_top_losses, and one argument to choose whether you want top best or top worst

oguiza commented 2 years ago

Hi, I've just added 2 new learner methods: top_losses and plot_top_learner. You just need to pass X and y and select k (number of losses) and largest (True for highest or Flase for lowest). I think these address the enhancement requested before. I've tested it and it seems to work well. cc: @AnthonyFang623 , @vrodriguezf

learn.top_losses(X[splits[1]], y[splits[1]], k=9, largest=True)
learn.plot_top_losses(X[splits[1]], y[splits[1]], k=9, largest=True)
vrodriguezf commented 2 years ago

Amazing Ignacio! I like it even more than the fastai version, since it's patched directly in the learner and does not need of a separate Interpretation object :)