whalegeek commented 2 years ago

Hello All, Am training an Model using the following code:


X, y = df2xy(df_train, #steps_in_rows=True, 
             sort_by=["Flow_ID", "Timestamp"], data_cols=df_train.columns[0:-1],
             target_col='Label')

splits = get_splits(y, valid_size=.5, balance=True, stratify=True, random_state=23, shuffle=True)

tfms  = [None, [Categorize()]]
dsets = TSDatasets(X, new_y, tfms=tfms, splits=splits, inplace=True)
dsets

bs = 1024
dls = TSDataLoaders.from_dsets(dsets.train, dsets.valid, bs=[bs, bs*2], batch_tfms=[TSStandardize(by_var=True)], num_workers=0)

model = build_ts_model(XCM, dls=dls)
learn = ts_learner(dls, arch=XCM, metrics=metrics, cbs=config["cbs"])

learn.fit_one_cycle(config["n_epoch"], lr_max=config["lr"])

interp = ClassificationInterpretation.from_learner(learn)
y_hat, new_y = flatten_check(interp.decoded, interp.targs)

The code runs fine. However, when I use the following code to add time step to the dataset, the prediction return the following error:


X, y = SlidingWindow(window_length,  sort_by=['Flow_ID', "Timestamp"], #'MyIdx'
                     horizon=horizon, seq_first=True, get_x=df_train.columns[:-1], 
                      stride=3, start=0, 
                     get_y='Label')(df_train)

y_hat, new_y = flatten_check(interp.decoded, interp.targs)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [78], in <cell line: 2>()
      1 start = time.time()
----> 2 y_hat, new_y = flatten_check(interp.decoded, interp.targs)
      3 #ss, new_y, y_hat = learn.get_preds(dl=learn.valid.ds, with_decoded=True, save_preds=None, save_targs=None)
      4 print(time.time() - start)

```AttributeError: 'ClassificationInterpretation' object has no attribute 'decoded'

Does anyone know what is the issue or different code to make prediction?
Br

oguiza commented 2 years ago

Hi @whalegeek, Could you please explain what you are trying to achieve with the last 2 lines of code?

interp = ClassificationInterpretation.from_learner(learn)
y_hat, new_y = flatten_check(interp.decoded, interp.targs)

whalegeek commented 2 years ago

Hi @oguiza , Appreciate your quick feedback. I would like to calculate the evaluation results for different metrics such as this:

from sklearn.metrics import  balanced_accuracy_score, f1_score,  precision_score,  confusion_matrix
accuracy_Bal = balanced_accuracy_score(new_y, y_hat)
f1 = f1_score(new_y, y_hat, average='weighted')
precision = precision_score(new_y, y_hat, average='weighted')
cnf_matrix = confusion_matrix(new_y, y_hat)

Br,

oguiza commented 2 years ago

Ok, I understand. In that case, all you need to do is to generate the predictions. You can easily do that once training finishes using:

preds, targets, y_hat = learn.get_X_preds(new_X, new_y)

where new_y and new_y should have the same format as X and y you used to train the model.

whalegeek commented 2 years ago

This code run sucesfull: preds, targets, y_hat = learn.get_X_preds(X, new_y)

However, this one fails

from sklearn.metrics import accuracy_score, balanced_accuracy_score, f1_score,  precision_score, recall_score
accuracy = accuracy_score(new_y, y_hat)

The error:
ValueError                                Traceback (most recent call last)
Input In [53], in <cell line: 2>()
      1 from sklearn.metrics import accuracy_score, balanced_accuracy_score, f1_score,  precision_score, recall_score
----> 2 accuracy = accuracy_score(new_y, y_hat)
      3 accuracy_Bal = balanced_accuracy_score(new_y, y_hat)
      4 f1 = f1_score(new_y, y_hat, average='weighted')

File ~/miniconda3/lib/python3.8/site-packages/sklearn/metrics/_classification.py:211, in accuracy_score(y_true, y_pred, normalize, sample_weight)
    145 """Accuracy classification score.
    146 
    147 In multilabel classification, this function computes subset accuracy:
   (...)
    207 0.5
    208 """
    210 # Compute accuracy for each possible representation
--> 211 y_type, y_true, y_pred = _check_targets(y_true, y_pred)
    212 check_consistent_length(y_true, y_pred, sample_weight)
    213 if y_type.startswith("multilabel"):

File ~/miniconda3/lib/python3.8/site-packages/sklearn/metrics/_classification.py:84, in _check_targets(y_true, y_pred)
     57 def _check_targets(y_true, y_pred):
     58     """Check that y_true and y_pred belong to the same classification task.
     59 
     60     This converts multiclass or binary types to a common shape, and raises a
   (...)
     82     y_pred : array or indicator matrix
     83     """
---> 84     check_consistent_length(y_true, y_pred)
     85     type_true = type_of_target(y_true)
     86     type_pred = type_of_target(y_pred)

File ~/miniconda3/lib/python3.8/site-packages/sklearn/utils/validation.py:332, in check_consistent_length(*arrays)
    330 uniques = np.unique(lengths)
    331 if len(uniques) > 1:
--> 332     raise ValueError(
    333         "Found input variables with inconsistent numbers of samples: %r"
    334         % [int(l) for l in lengths]
    335     )

ValueError: Found input variables with inconsistent numbers of samples: [693599, 6919819]

Am trying to change different paramters and try to understant what is the issue.

oguiza commented 2 years ago

It seems that X and new_y contain different number of samples. You need to pass 2 args new_x and new_y. They must contain the same number of samples.

whalegeek commented 2 years ago

Here are the code:

Error details:

ValueError                                Traceback (most recent call last)
Input In [64], in <cell line: 2>()
      1 from sklearn.metrics import accuracy_score
----> 2 accuracy = accuracy_score(targets, y_hat)

File ~/miniconda3/lib/python3.8/site-packages/sklearn/metrics/_classification.py:211, in accuracy_score(y_true, y_pred, normalize, sample_weight)
    145 """Accuracy classification score.
    146 
    147 In multilabel classification, this function computes subset accuracy:
   (...)
    207 0.5
    208 """
    210 # Compute accuracy for each possible representation
--> 211 y_type, y_true, y_pred = _check_targets(y_true, y_pred)
    212 check_consistent_length(y_true, y_pred, sample_weight)
    213 if y_type.startswith("multilabel"):

File ~/miniconda3/lib/python3.8/site-packages/sklearn/metrics/_classification.py:84, in _check_targets(y_true, y_pred)
     57 def _check_targets(y_true, y_pred):
     58     """Check that y_true and y_pred belong to the same classification task.
     59 
     60     This converts multiclass or binary types to a common shape, and raises a
   (...)
     82     y_pred : array or indicator matrix
     83     """
---> 84     check_consistent_length(y_true, y_pred)
     85     type_true = type_of_target(y_true)
     86     type_pred = type_of_target(y_pred)

File ~/miniconda3/lib/python3.8/site-packages/sklearn/utils/validation.py:332, in check_consistent_length(*arrays)
    330 uniques = np.unique(lengths)
    331 if len(uniques) > 1:
--> 332     raise ValueError(
    333         "Found input variables with inconsistent numbers of samples: %r"
    334         % [int(l) for l in lengths]
    335     )

ValueError: Found input variables with inconsistent numbers of samples: [693599, 5580607]

oguiza commented 2 years ago

Hi @whalegeek, sorry for the late reply. This is very strange. What is the shape of preds, targets and y_hat?

whalegeek commented 2 years ago

Hello @oguiza, Here are some information about the variables:

Br,

oguiza commented 2 years ago

The first issue I can spot is the X shape. X has 81 variables and 2 steps only. I wouldn't consider that a time series with just 2 steps. Are you sure that's the input you want to use? Remember the input to a time series model in tsai is always:

   [n_samples x n_vars x n_steps]

whalegeek commented 2 years ago

I want to study how the different numbers of steps influence model performance. I will test with a different number of steps and will report the results later.

oguiza commented 2 years ago

I understand. The issue though is that 2 is such a short sequence that is smaller than the kernels used by the models. And this will likely create an issue. It'd be good to test if you get the same issue when the number of steps is bigger (ie. 10, 20, 100)

whalegeek commented 2 years ago

I performed experiments using 10, 20 and 50 steps. here are the results:

10 steps: 10 steps error

20 steps: 20 steps error

50 steps: 50 steps Error

whalegeek commented 2 years ago

The steps is defined here: window_length = 50 horizon = 0 X, y = SlidingWindow(window_length, sort_by=['Flow_ID', "Timestamp"], horizon=horizon, seq_first=True, get_x=df_train.columns[:-1], stride=49, start=0, get_y='Label')(df_train)

stride = window_length - 1

whalegeek commented 2 years ago

Library version compatibility issue. The version compatible with the code are: tsai : 0.2.25 fastai : 2.5.3 fastcore : 1.3.27 torch : 1.8.1

Br

farnaz-orooji commented 2 years ago

Hi @whalegeek, I think your issue is because of the string type for the preds. you can simply convert this "string" to the equivalent "int" values.

new_preds = [int(s) for s in preds[1:-1].split(', ')]

timeseriesAI / tsai

AttributeError: 'ClassificationInterpretation' object has no attribute 'decoded' #487

stride = window_length - 1