cannot use Learner.get_X_preds

zhaosiyuan1098 commented 1 year ago

Hello, I'm a beginner with tsai .

I have successfully trained a time series classification model using tsai, and here is my code:

    splits = get_splits(y, valid_size=.2,test_size=0.1, stratify=True, random_state=23, shuffle=True)
    tfms  = [None, [Categorize()]]
    x_dsets = TSDatasets(x_3d, y, tfms=tfms, splits=splits, inplace=True)
    batch_tfms=[TSStandardize(),TSNormalize()]
    bs=64
    x_dls   = TSDataLoaders.from_dsets(x_dsets.train, x_dsets.valid, bs=[bs, bs*2],batch_tfms=batch_tfms)
    x_model = build_ts_model(XceptionTime, dls=x_dls)
    learn = Learner(fft_dls, fft_model, metrics=[accuracy, RocAuc()])
    learn.fit_one_cycle(100, 1e-3)
    learn.save_all(path='models', dls_fname='x_dls', model_fname='x_model', learner_fname='x_learner')

However, when I use the Learner.get_X_preds to predict the classification results, I encounter some issues:

If I use the validation set for prediction directly：

    x_learn = load_learner_all(path='models', dls_fname='x_dls', model_fname='x_model', learner_fname='x_learner')
    dls = x_learn.dls
    valid_dl = dls.valid
    valid_probas, valid_targets, valid_preds = x_learn.get_preds(dl=valid_dl, with_decoded=True)
    print("x model accuracy=    "+str((valid_targets == valid_preds).float().mean()))

I get satisfactory results,most valid_targets equal the valid_preds and the valid_probas looks reasonable.

But when I try to use the trained model to classify the newly generated set, the output of predict is almost the same no matter what the input is.:

x_learn = load_learner("./models/x_learner.pkl")
splits = get_splits(y, valid_size=.2,test_size=0.1, stratify=True, random_state=7, shuffle=True)
X_test = x_3d[splits[1]]
y_test = y[splits[1]]
x_probas, x_targets, x_preds,x_loss  = x_learn.get_X_preds(X_test,y_test,bs=64,with_loss=True, with_decoded=True,)
print(x_targets)
print(x_preds)

then I get the output:


tensor([ 3,  0,  5, 10,  7,  8, 10, 11,  7,  3,  7, 10, 11,  4,  2,  6,  2,  9,
     9, 11,  7, 10,  8,  6,  4,  4,  4,  1,  0,  5,  3,  9,  1,  1,  1, 11,
     7, 11,  7,  2,  0,  6,  1,  4, 10, 11,  1,  2,  3,  9,  8,  6,  7,  3,
     5,  2,  8,  1,  4, 10,  2,  1,  5,  9,  6,  0,  2,  0,  9,  3,  7,  0,
     4,  8,  6,  6,  3, 10,  3,  2, 11,  2,  0,  4,  1,  9,  8,  5,  0,  1,
     ])
[10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 11.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, ]

there are 12 kinds of true lables for` X_test`,but the predict lable of the trained model is always the same.Besides,`x_loss` is  very big.

I've consulted the tsai documentation and found that the official [Learner.get_X_preds](https://timeseriesai.github.io/tsai/inference.html) seems to have this issue :

The labels of `x` have four types: 0, 1, 2, and 3, but the predicted results is always type 1, and the probabilities for each class are almost equal:

(tensor([[0.2632, 0.2575, 0.2431, 0.2362], [0.2632, 0.2575, 0.2431, 0.2363], [0.2631, 0.2575, 0.2431, 0.2363], [0.2631, 0.2575, 0.2431, 0.2363], [0.2632, 0.2575, 0.2431, 0.2363], [0.2632, 0.2575, 0.2431, 0.2362], [0.2632, 0.2575, 0.2431, 0.2362], [0.2631, 0.2575, 0.2432, 0.2362],


May I ask why there is a situation that looks like a classification failure and how can I modify my code to fix it？

My computer environment is:

o /s : Windows-10-10.0.22000-SP0 python : 3.9.16 tsai : 0.3.5 fastai : 2.7.11 fastcore : 1.5.29 torch : 1.13.0 device : 1 gpu (['NVIDIA GeForce GTX 1650']) cpu cores : 6 threads per cpu : 2 RAM : 15.85 GB GPU memory : [4.0] GB


This is my first time using GitHub issue and I'm not a English native speaker, so if my description is unclear or doesn't follow the community guidelines, please let me know,then I will modify it soon,thank you.

epdavid1 commented 1 year ago

Probably because your new data was not normalized before passing it in the model? Yours is a typical result of a non-normalized/scaled data input.

zhaosiyuan1098 commented 1 year ago

Thanks for your help! I finally found out that this problem was caused by my new dataset and I have solved it successfully~

timeseriesAI / tsai

cannot use Learner.get_X_preds #753