timeseriesAI / tsai

Time series Timeseries Deep Learning Machine Learning Python Pytorch fastai | State-of-the-art Deep Learning library for Time Series and Sequences in Pytorch / fastai
https://timeseriesai.github.io/tsai/
Apache License 2.0
4.95k stars 625 forks source link

Minirocket giving different test accuracies with every model on the same test data #745

Open hailthedawn opened 1 year ago

hailthedawn commented 1 year ago

I am using the MiniRocket classifier, and trying to perform emotion detection on ~1700 utterances. I've segmented the data into train and test, with about 300-400 utterances in the test set. Each utterance corresponds to an output label (of which there are three total). Every time I run the build model -> get test accuracy cells, I'm getting very different test accuracies (ranging from 30% accuracy to 80%). What could be the reason behind this? I know my dataset is small - is that the reason? What do you recommend I do to get more consistencies with MiniRocket?

Note: What I am actually providing as input to the classifier is CNN embeddings corresponding to each utterance.

hailthedawn commented 1 year ago

@oguiza Sorry for the tag - I noticed you have answered Minirocket questions in the past, so thought I would ask you if you have any recommendations wrt this?

oguiza commented 1 year ago

Hi @hailthedawn, That's strange. I haven't seen that type of variation in the score. Are you using something similar to this?

from tsai.models.MINIROCKET import MiniRocketClassifier

# Univariate classification with sklearn-type API
dsid = 'OliveOil'
X_train, y_train, X_valid, y_valid = get_UCR_data(dsid)   # Download the UCR dataset

# Computes MiniRocket features using the original (non-PyTorch) MiniRocket code.
# It then sends them to a sklearn's RidgeClassifier (linear classifier).
model = MiniRocketClassifier()
timer.start(False)
model.fit(X_train, y_train)
t = timer.stop()
print(f'valid accuracy    : {model.score(X_valid, y_valid):.3%} time: {t}')
cedced19 commented 9 months ago

Hi this is strange I don't find the same behavior as you. Me I get good results, until it comes to classify the PenDigits dataset (from UCR dataset), it gives me always the same accuracy: 0.106 ... I think I miss something.