Closed whalegeek closed 2 years ago
Hi @whalegeek, Could you please explain what you are trying to achieve with the last 2 lines of code?
interp = ClassificationInterpretation.from_learner(learn)
y_hat, new_y = flatten_check(interp.decoded, interp.targs)
Hi @oguiza , Appreciate your quick feedback. I would like to calculate the evaluation results for different metrics such as this:
from sklearn.metrics import balanced_accuracy_score, f1_score, precision_score, confusion_matrix
accuracy_Bal = balanced_accuracy_score(new_y, y_hat)
f1 = f1_score(new_y, y_hat, average='weighted')
precision = precision_score(new_y, y_hat, average='weighted')
cnf_matrix = confusion_matrix(new_y, y_hat)
Br,
Ok, I understand. In that case, all you need to do is to generate the predictions. You can easily do that once training finishes using:
preds, targets, y_hat = learn.get_X_preds(new_X, new_y)
where new_y and new_y should have the same format as X and y you used to train the model.
This code run sucesfull:
preds, targets, y_hat = learn.get_X_preds(X, new_y)
However, this one fails
from sklearn.metrics import accuracy_score, balanced_accuracy_score, f1_score, precision_score, recall_score
accuracy = accuracy_score(new_y, y_hat)
The error:
ValueError Traceback (most recent call last)
Input In [53], in <cell line: 2>()
1 from sklearn.metrics import accuracy_score, balanced_accuracy_score, f1_score, precision_score, recall_score
----> 2 accuracy = accuracy_score(new_y, y_hat)
3 accuracy_Bal = balanced_accuracy_score(new_y, y_hat)
4 f1 = f1_score(new_y, y_hat, average='weighted')
File ~/miniconda3/lib/python3.8/site-packages/sklearn/metrics/_classification.py:211, in accuracy_score(y_true, y_pred, normalize, sample_weight)
145 """Accuracy classification score.
146
147 In multilabel classification, this function computes subset accuracy:
(...)
207 0.5
208 """
210 # Compute accuracy for each possible representation
--> 211 y_type, y_true, y_pred = _check_targets(y_true, y_pred)
212 check_consistent_length(y_true, y_pred, sample_weight)
213 if y_type.startswith("multilabel"):
File ~/miniconda3/lib/python3.8/site-packages/sklearn/metrics/_classification.py:84, in _check_targets(y_true, y_pred)
57 def _check_targets(y_true, y_pred):
58 """Check that y_true and y_pred belong to the same classification task.
59
60 This converts multiclass or binary types to a common shape, and raises a
(...)
82 y_pred : array or indicator matrix
83 """
---> 84 check_consistent_length(y_true, y_pred)
85 type_true = type_of_target(y_true)
86 type_pred = type_of_target(y_pred)
File ~/miniconda3/lib/python3.8/site-packages/sklearn/utils/validation.py:332, in check_consistent_length(*arrays)
330 uniques = np.unique(lengths)
331 if len(uniques) > 1:
--> 332 raise ValueError(
333 "Found input variables with inconsistent numbers of samples: %r"
334 % [int(l) for l in lengths]
335 )
ValueError: Found input variables with inconsistent numbers of samples: [693599, 6919819]
Am trying to change different paramters and try to understant what is the issue.
It seems that X and new_y contain different number of samples. You need to pass 2 args new_x and new_y. They must contain the same number of samples.
Here are the code:
Error details:
ValueError Traceback (most recent call last)
Input In [64], in <cell line: 2>()
1 from sklearn.metrics import accuracy_score
----> 2 accuracy = accuracy_score(targets, y_hat)
File ~/miniconda3/lib/python3.8/site-packages/sklearn/metrics/_classification.py:211, in accuracy_score(y_true, y_pred, normalize, sample_weight)
145 """Accuracy classification score.
146
147 In multilabel classification, this function computes subset accuracy:
(...)
207 0.5
208 """
210 # Compute accuracy for each possible representation
--> 211 y_type, y_true, y_pred = _check_targets(y_true, y_pred)
212 check_consistent_length(y_true, y_pred, sample_weight)
213 if y_type.startswith("multilabel"):
File ~/miniconda3/lib/python3.8/site-packages/sklearn/metrics/_classification.py:84, in _check_targets(y_true, y_pred)
57 def _check_targets(y_true, y_pred):
58 """Check that y_true and y_pred belong to the same classification task.
59
60 This converts multiclass or binary types to a common shape, and raises a
(...)
82 y_pred : array or indicator matrix
83 """
---> 84 check_consistent_length(y_true, y_pred)
85 type_true = type_of_target(y_true)
86 type_pred = type_of_target(y_pred)
File ~/miniconda3/lib/python3.8/site-packages/sklearn/utils/validation.py:332, in check_consistent_length(*arrays)
330 uniques = np.unique(lengths)
331 if len(uniques) > 1:
--> 332 raise ValueError(
333 "Found input variables with inconsistent numbers of samples: %r"
334 % [int(l) for l in lengths]
335 )
ValueError: Found input variables with inconsistent numbers of samples: [693599, 5580607]
Hi @whalegeek, sorry for the late reply. This is very strange. What is the shape of preds, targets and y_hat?
Hello @oguiza, Here are some information about the variables:
Br,
The first issue I can spot is the X shape. X has 81 variables and 2 steps only. I wouldn't consider that a time series with just 2 steps. Are you sure that's the input you want to use? Remember the input to a time series model in tsai is always:
[n_samples x n_vars x n_steps]
I want to study how the different numbers of steps influence model performance. I will test with a different number of steps and will report the results later.
I understand. The issue though is that 2 is such a short sequence that is smaller than the kernels used by the models. And this will likely create an issue. It'd be good to test if you get the same issue when the number of steps is bigger (ie. 10, 20, 100)
I performed experiments using 10, 20 and 50 steps. here are the results:
10 steps:
20 steps:
50 steps:
The steps is defined here: window_length = 50 horizon = 0 X, y = SlidingWindow(window_length, sort_by=['Flow_ID', "Timestamp"], horizon=horizon, seq_first=True, get_x=df_train.columns[:-1], stride=49, start=0, get_y='Label')(df_train)
Library version compatibility issue. The version compatible with the code are: tsai : 0.2.25 fastai : 2.5.3 fastcore : 1.3.27 torch : 1.8.1
Br
Hi @whalegeek, I think your issue is because of the string type for the preds
. you can simply convert this "string" to the equivalent "int" values.
new_preds = [int(s) for s in preds[1:-1].split(', ')]
Hello All, Am training an Model using the following code:
The code runs fine. However, when I use the following code to add time step to the dataset, the prediction return the following error: