shendu-ht / SLA-VAE

A Semi-Supervised VAE Based Active Anomaly Detection Framework in Multivariate Time Series for Online Systems
https://dl.acm.org/doi/pdf/10.1145/3485447.3511984
19 stars 3 forks source link

Question about the feature_ext #4

Open cutespider opened 1 year ago

cutespider commented 1 year ago

For single KPI case, in each extract function, there is one line of code: x_window = x_window[:-1]. If the idx start from 0, then the length of x_window is 1, then x_window[:-1] will be an empty list and the next while loop will be an infinite loop. Is this a bug or the idx just can't start from 0?

cutespider commented 1 year ago

In the following code, the while loop will only execute if x_window is empty, but if x_window is empty, it will become an infinite loop. Maybe there is some problems.

` if dim == 1: window_obs = np.zeros(shape=(idx_len, 1, self.sample_size)) cur_obs = np.zeros(shape=(idx_len, 1, 1))

        # Due to the low complexity of window extraction,no need for performance optimization.
        for i in range(idx_len):
            # the last value is cur_obs
            x_window = p_window_ext(x, pre_window, post_window, day_window, idx[i]).tolist()
            cur_obs[i, 0, 0] = x_window[-1]

            x_window = x_window[:-1]
            while len(x_window) < self.sample_size and not x_window:
                x_window += x_window

`

shendu-ht commented 1 year ago

Thanks for your valuable advice. We do check the input x_window before extracting time series features. Without enough observations, we will not execute the service.

But what you proposed is valuable, we need to optimize the logic here. Thanks.

cutespider commented 1 year ago

Thanks for your reply. Is the check of input in this project? I did't see it. Could you provide the check code?Thanks.

shendu-ht commented 1 year ago

In practice, when detecting anomalies in online streaming data, the window size of input time series should be longer than 1h so that the detection results are meaningful. This project did not provide such a logic. The implementation with a configuration is simple.

dataframe.shape[0] < Threshold

This logic is embedded in the online platform with a threshold configuration. Only when the window size of historical time series data is larger than configured threshold, anomaly detection service will become effective.

shendu-ht commented 1 year ago

Besides, calculating the frequency of input time series is also a similar preprocessing logic. As we know that online time series have multiple frequencies, such as 30s, 1min, 2min and so on. The window size is too short to calculate a correct frequency. We checked the input window size in util_timestamp in this project.

cutespider commented 1 year ago

Ok, I see, thanks a lot. By the way, the 69th line of https://github.com/shendu-ht/SLA-VAE/blob/main/src/common/utils_timestamp.py may have a little problem, the converted timestamp does not correspond to the original datetime. You may use datetime.datetime.timestamp(idx) instead of idx.timestamp().