[QUESTION] Usage on Times Series Anomaly Detection?

Stormyfufufu commented 2 months ago

Hi,

I'm new to machine learning and am looking to solve a predictive/forecasting problem:

I have a primarily text based log-events and corresponding time series data. From the log-events, I know that given information sequence ‘A’, the next informational sequence is most often ‘B’. However there may well be several other sequences that are also less likely.

What I’m looking for is a learning method that can identify unusual information sequences so they can be reviewed and subsequently validated as either truly anomalous or potentially a new, yet valid, sequenced item. It would be better if the method is multivariate, where it learns things like 'B occurs most likely after an hour of A', etc.

My gut tells me this should be a time-series anomaly detection problem, but I have no idea where to start with. Is this possible to be solved by darts? Any hints or insights are greatly appreciated.

Thanks!

madtoinou commented 2 months ago

Hi @Stormyfufufu,

In order to use the current anomaly detection module Darts, there is the assumption that you have access to historical data without anomalies in order to train a forecasting model and then apply a scoring method between the forecasted and the observed values to detect anomalies.

Based on the description of your problem, you should be able to use the approaches currently available and each detected anomaly will correspond to an "exotic" item in the sequence that should be reviewed. However, these predictions will heavily depend on the sequences present in the training set and you will have to find a good way to encode these "A" & "B" items into numerical values (which might not be straightforward).

I think that approaches relying on classification, clustering or modeling (Markov-Chain?) might be more suitable than anomaly detection per se.

Let me know if anything remains unclear

Stormyfufufu commented 2 months ago

Thank you for the reply @madtoinou .

As for now, I have already parsed the log-events into numerical values (from 1 to 8 representing eight possible events). The historical data I have contains around 500 data without anomalies, is it enough to train a forecasting model? I believe this is not a classification nor clustering problem, but Markov-Chain model could be useful. Can Darts be used to predict Markov-decision processes?

I'm currently trying to follow the anomaly detection example in https://unit8co.github.io/darts/examples/22-anomaly-detection-examples.html to train and forecast the sequence.

madtoinou commented 1 month ago

One-hot encoding the events can be appropriate, it ultimately depends on the nature of the events.

500 data-points could be enough for regression models, probably too little for torch-based models but, again, it depends on the nature and complexity of the pattern in your dataset.

Technically, anomaly detection is already a classification problem; you try to label periods of the signal as normal/abnormal. As for clustering, it's a way to identify anomalies (check the Darts K-means scorer here for example).

Darts can be used to predict Markov-decision processes but it's probably not ideal.

The notebook is a good start, feel free to close this issue if you original question was answered.

Stormyfufufu commented 1 month ago

Sorry for the late reply @madtoinou , I was busy with some other things on the past few days.

I followed the tutorials of darts and they work as shown, but when I tried to apply the models to my data, it just does not work as I wish. I suspected my data is too complex with varying time intervals and too many different possible events, hence I simplified the time intervals to a constant time step = 1 and created a very simple dataset with only two numeric events ("1" or "2").

Could you please help me on which model to choose or how to do the prediction? I created a very simplified dataset where the data contains only 2 events, "event 2" always happens at a 50 time step interval, otherwise "event 1", pseudo generation as follows:

for i in range (0,350): series.loc[i,'event'] = 1 if i % 50 == 0 series.loc[i,'event'] =2

I tried using model = RNNModel(input_chunk_length=20, model='rnn') and ...(...model = 'lstm'), and expected a forecast plot of the data cycle, but it doesn't work. It shows a weird prediction as the attached image. What should I use in this case to predict the data cycles?

Thank you very much in advance for a reply. And millions thanks if you don't mind to share the code with me.

Figure_1

madtoinou commented 1 month ago

Hi @Stormyfufufu,

I cannot really recommend any model, just start with regression, get some results and then experiment with boosted trees and deep learning. The notebook contains all the step-by-step instructions to train a model and run inference.

The forecast you shared is not unexpected; the dataset you used to fit the model is probably quite small, with input_chunk_length=20, the model never "sees" the seasonality in the data (since "2" occurs every 50 steps, you should get perfect results if you swap the values of example) and only have access to a unique value across all the features and the event "2" is happening so rarely (label imbalance) that the model optimized its loss function by always predicting 1.05. This is what I meant when I said that this problem is more classification oriented that regression; you try to predict discrete labels, not continuous numerical values and the "raw" model architectures will not be suitable. I would recommend tabularizing your dataset manually (especially if the frequency is not consistent) and leverage sklearn classification models; you will learn much more this way.

If you need a refresher, please have a look at the quickstart notebook. If you are new to machine learning, I would also recommend watching videos and finding learning materials to get more familiar with the maths behind.

Stormyfufufu commented 1 month ago

@madtoinou I see. Thank you very much for the explanation. I guess I'll have to first read through the learning materials and learn step by step instead of trying to directly performing an implementation.

unit8co / darts

[QUESTION] Usage on Times Series Anomaly Detection? #2541