Closed filozyu closed 6 years ago
Hi Zheyu,
Thank you for your interest in our research! It's nice to have your feedback!
1) As referred to the original DAGMM paper (Zong et al, 2018) and the DSEBM paper (Zhai et al, 2016) the labels are switched for this anomaly detection task. This is why you may have been confused...
2) Yes indeed. This is an interesting question as in general, we found that unsupervised anomaly detection may be very challenging. How can one define a threshold for the anomaly score without any knowledge? This still remains a question very few recent paper tackled. So what we did was reproduce the same experimental setup as the DAGMM paper (Zong et al, 2018) and also assumed to have this knowledge on the test set. You can think of it as if you have the knowledge of what is the proportion of people having a certain disease in a population, should you do anomaly detection on medical images or any health application...
If anyone has ideas for this challenge in unsupervised anomaly detection, please suggest, this could be a nice contribution in any future research actually!
Hi Houssam,
Thank you for replying to my questions.
Yes indeed the normal data are assumed anomalous class in this task as suggested in the two papers to make the ratio of anomalies roughly 20% of the whole data, however no further justification was found in both papers of why flipping the labels other than making the anomalies a minority compared to normal labels. And the DSEBM paper shows experiments with varying ratio of anomalous data will have poorer performance when that ratio is high. But I guess the algorithm trained on swapped label can still be used to predict intrusions in KDD99 since it classifies attack/non-attack at the end.
As for the second question, the training data are actually attacks which are supposed (only my guess) to be more different from each other than non-attacks are. So I don't know whether this will impose problems if we just pick say the 95 percentile of the training scores as the threshold as the scores can be very different?
Thanks for your questions!
Regarding your questions about the experimental setup, we again followed the same setup as the one in the "DAGMM" original paper. I suggest to also contact them and have their opinion. They replied to some of our questions.
Apart from that, to set a threshold from a practical point of view, I think the experimental setup which is fully unsupervised looks a bit unrealistic. What would be more likely in an industry application would be probably to have a lot of normal training data and very few labeled samples containing anomalous data. With these data you could actually train your network without the labels but create a validation (containing both normal and anomalous data) set with few labels to choose a certain threshold (for example, maximizing the F1 score? maximizing any other metric which can be more relevant to your context?). However, do note this setup would not be fully "unsupervised", yet still more "unsupervised" than some other "semi-supervised" anomaly detection techniques.
As those questions are no more related to the code itself, I'll be glad to answer to them in separate email threads! Please do not hesitate to include the other authors and myself in the thread so that we can all have a better idea of the questions and give our opinions.
Houssam
Hello Houssam,
First of all I'd like to thank you for sharing the code and the amazing paper you and your fellow researchers produced. I have a few questions regarding the implementations, it would be great if you could take some time and look at these.
In kdd.py, it was assumed 1 for anomalous label and 0 for normal label throughout the code, but in the function
_get_dataset
the anomalous data was assigned with label 0 and the normal 1 (lines 46-47), which confused me, these two lines also contradict with the comment in line 128. I did reverse the label in these two lines (keeping other settings intact, cross entropy loss, L1 norm in reconstruction loss used) and the F1 score dropped drastically. Please correct me if I was wrong.In BiGAN_run_kdd.py when you are evaluating the anomalous data in the test set, you used the fact that 20% of the test data are anomalous and thus assigned anomalous label to the data with top 20% scores in test set (lines 319-328). Is that allowed in anomaly detection? I think an anomaly detection algorithm should be able to detect unknown anomalies.
Thank you for your contribution, it's an exciting work!
Zheyu