Time series Timeseries Deep Learning Machine Learning Python Pytorch fastai | State-of-the-art Deep Learning library for Time Series and Sequences in Pytorch / fastai
I am trying to solve the binary classification problem using tsai.
I have a kind-a large dataset. I cannot use apply_sliding_window directly on it because I run into OOM.
That is why I am now trying to use the TSUnwindowedDataset[s] routines. And I doubt several points whether I'm doing the right thing.
For the following example I took just a part of full dataset, so the shape of this slice does not really matter, it is just FYI.
df.shape
# (2358720, 7)
Now I extract features and target from the slice
X = df.drop(columns=['time', 'target']).values
y = df['target'].values
type(X), type(y), X.shape, y.shape
# (numpy.ndarray, numpy.ndarray, (2358720, 5), (2358720,))
After I've created splits, I create instances of the TSUnwindowedDataset and TSUnwindowedDatasets classes:
WINDOW_SIZE = 50
def my_y_func(y_):
return y_[:,-1] # I need only the last item from the window of targets
ds = TSUnwindowedDataset(X=X, y=y, y_func=my_y_func, window_size=WINDOW_SIZE, seq_first=True)
dsets = TSUnwindowedDatasets(ds, splits=splits)
dls = TSDataLoaders.from_dsets(dsets.train, dsets.valid, dsets[2], # incuding test part of dataset
bs=256,
shuffle_train=False,
batch_tfms=TSStandardize(by_sample=True)
)
As for my target - y[i] == 1 is good and 0 is badBut what doeslabel[i] == 1mean? It can mean the same as my target, but since the prediction return probabilities of shape (N, 2) I suspect it means the opposite.
And here is the third point - I cannot reproduce validation ROC AUC score anywhere near displayed on the chat.
In both ways I compare predicted labels to my target on the validation subset - I get ROC AUC ~0.5, but the chart shows 0.75
Why that happens? What am I missing?
Hi @oguiza
I am trying to solve the binary classification problem using
tsai
. I have a kind-a large dataset. I cannot useapply_sliding_window
directly on it because I run into OOM. That is why I am now trying to use theTSUnwindowedDataset[s]
routines. And I doubt several points whether I'm doing the right thing.For the following example I took just a part of full dataset, so the shape of this slice does not really matter, it is just FYI.
Now I extract features and target from the slice
Checking the target is binary:
Now the
tsai
library kicks in:After I've created splits, I create instances of the
TSUnwindowedDataset
andTSUnwindowedDatasets
classes:and here is the first point:
The class count is
1
instead of expected2
for binary classification. If I try to create model and train itI get following error:
This approach differs from the sample notebook, where a transformation is used for the target:
However, the
TSUnwindowedDataset
does not have such functionality.How to properly introduce the target to the data loader in that case?
As a temporary solution, I have tried to train model like this:
This code trains the model and I even get pretty good-looking charts at the end![Xnip2024-05-31_19-32-13](https://github.com/timeseriesAI/tsai/assets/5753506/dd6fedd9-1a3d-4085-b6b0-ba279ab74b2a)
But here is the second point: I don't know how to properly interpret predictions.
As for my target -
y[i] == 1
isgood
and0
isbad
But what doeslabel[i] == 1
mean? It can mean the same as my target, but since the prediction return probabilities of shape(N, 2)
I suspect it means the opposite.So to check it I've created a method:
...and tried both ways
And here is the third point - I cannot reproduce validation ROC AUC score anywhere near displayed on the chat. In both ways I compare predicted labels to my target on the validation subset - I get ROC AUC ~0.5, but the chart shows 0.75 Why that happens? What am I missing?