Open yingbaihu opened 9 months ago
Thank you for your interest in our work.
In time series anomaly detection, UCR is a good dataset, but its drawback is that it is difficult to divide the verification set because there is only one abnormal fragment in the test set. The UCR authors also acknowledge this problem. Therefore, for UCR we regard the sample with the largest anomaly score as an abnormal sample. The F1 score obtained in this way is the best F1 among all thresholds. If the anomaly scores obtained by some models are not very distinguishable, the anomaly scores of many samples are the maximum value. In this case, the accuracy will naturally be low and the F1 will be very poor. The threshold in classification tasks is really difficult to determine, so the generally used indicators are: Best F1 or AUC. For UCR, you only need to take the maximum value. For data sets where the KPI contains many abnormal segments, you can only find it in the quantile 0~100%, or convert to Z-score, in (-3, 3) Find in. I didn't understand this " In the evalution mode, you didn't split some batches, it means the whole test dataset sample number should smaller than batchsize (512)". Why is it less than 512?
We observed that the means and variances of some training sets and test sets are very different. Strictly speaking, the distributions of these training sets and test sets are different. This is a concept drift problem. Therefore, both the test set and the training set were considered during normalization, and there was indeed a problem of data leakage. Our subsequent work found that on UCR and KPI data sets, only considering the mean value of the training set can also achieve good performance.
- because there is only one abnormal fragment in the test set
Thanks for the reply, I still have some questions.
And one question more: why you introduce soft-boundary invariance? just because the training dataset contains anomalous? or another reason? if the training dataset has no anomalous, is the soft-boundary invariance still suitable for this case?
Our model generates anomaly scores according to time windows, that is, one anomaly score per sample/time window. So we need to convert the original labels of the dataset into window-based labels. Unless the time step is set to 1, then it can be calculated one by one. But that calculation is too slow. Reconstruction-based methods (LSTM-ED) can naturally generate anomaly scores point by point.
Because the KPI's training set contains anomalies. Although the training sets of some data sets do not have labeled anomalies, it is found through experiments that soft-boundary is more effective. We guess it contains some noisy data.
Hi, sorry to bother you again, still one question about computing the center_c when you compute the center_c and also during the training stage, you combine the original data with two augmented data, can I ask why? and for computing the center_c, you divided 2, 'c /= (2 * n_samples)', but you triple the data size, can you explain this part to me?
We input the original and augmented data into the model to get the original 'outputs' and 'dec ', and then 'c /= (2 * n_samples)'.
When training the model, original data and enhanced data are used. So, naturally, augmented data is also required for computing center 'center_c'.
emmm, I mean why divide 2 n_samples (c /= 2 n_samples), not 3*n_samples
all_data = torch.cat((data, aug1, aug2), dim=0)
outputs, dec = model(all_data)
n_samples += outputs.shape[0]
all_feature = torch.cat((outputs, dec), dim=0)
c += torch.sum(all_feature, dim=0)
'all_data' already contains 'data', 'aug1', and 'aug2'. Both 'outputs' and 'dec' are used to calculate the center 'c', and 'n_samples' is just the amount of "outputs'. So 'c' needs to be divided by 2.
Hi, I'm reading your work recently, it's an interesting work and inspired me a lot, thanks very much!
And I have some questions, could you please help me to understand?
I just focus on the UCR dataset, because my situation is same: without anomalous in training dataset, and I'm very new about these kinds of dataset, because you take each element of one time-series sample as a single entity, but in my case, I usually take one time-series sample as one single entity. So, I cannot understand some part of your implementations.
1) about the threshold part, each time-series sample has one score vector, you choose the highest score as the threshold, however, there are many time-series samples, which result in corresponding number of highest scores and then thresholds, in this case, how to define the final threshold? or how to obtain the threshold for test dataset without knowing the test data?
my tuition thinks it should relates to these lines: test_affiliation, test_score, predict = ad_predict(test_target, test_score_origin, config.threshold_determine, config.detect_nu) score_reasonable = tsad_reasonable(test_target, predict, config.time_step) however, the input is test_score_origin, obtained from test dataset. In the evalution mode, you didn't split some batches, it means the whole test dataset sample number should smaller than batchsize (512), but I still cannot understand how to determine the threshold for several samples in the test dataset. Or, in UCR dataset, there is just one time-series?
It's a little abstract for me to understand this part, could you please provide some explainations?
2) when you use MeanVarNormalize to do the standardization process, you also involved test data, this operation will not cause data leakage? mvn.train(train_time_series_ts + test_time_series_ts)