Closed sandeepvvn closed 5 years ago
Do you mean the sequences like 5 22 in hdfs_test_abnormal?
lets think i train "5 22" sequence and put it in normal . It is still a false positive
I have done experimentation but to come to conclusion, need answer to this basic question. What will be an ideal window size, how do we determine it from the data?
Why it is a false positive?
When the data has sequences of length less than a typical window size of 3. The sequences are determined as anomalies, though the sequences are already present in training data. Making the window size less than 3 doesn't make sense for syslog data. But some process have 1 ,2 or 3 logs over time , they are been taken as anomalies by the lstm.
How can we handle these sequences and not generate more false positives