Closed lihyin closed 3 years ago
Yes, that could happen. Due to the exponential function some anomalies are mapped to extremly large z values. Often the mean score is dominated by a few extremely large scores. Another effect is that the model is not very stable in predicting the score for anomalies - as far as I observed the scores of normal samples are quite stable. To be honest, I did not made too much experiments on the dummy dataset, it should only be a toy example.
@marco-rudolph I have a related question. I also observed very large values of both anomaly_score and the gradients.
In the paper you mentioned there is a threshold theta chosen to decide whether the input has anomaly or not. So does this mean the choice of theta is quite arbitrary and could vary for different inputs? Would it make sense to use some kind of activation function like softmax to ensure the anomaly_score in the range of (0,1)? This way, may we generally pick a number like 0.5 to be the threhold theta?
Similarly, I have problems when the gradients are very large and thus it is very hard to decide where the detects are in a general way since the there is no maxium value. Would a softmax like activation function be good idea here too? I feel it will have several benefits. One is to have a fix value range in the gradient map of [0,1] (a problem discussed in #2). Also, when I overlay the gradient map on the original image, I could potentially use the gradient values as the alpha parameter to set the transparency which would help identify where the threshold is.
I would say the choice of theta could vary for different datasets. One could have a validation set of non-anomalies to estimate which threshold should be chosen for a specific target false positive rate. Surely, it would not be bad to have a score between 0 and 1 - but in my opinion it does not really matter if you set it to 0.5 after softmax or 0 before softmax. But you may be right that it feels more comfortable and familiar. Feel free to add an option which applies softmax on the score and the gradients.
Sorry, I somehow mixed up softmax and sigmoid in my head... Read my last post as if softmax would be sigmoid. The problem of applying softmax on anomaly scores is that the (unknown) ratio of anomalies and the number of scores would have an impact on the softmax scores what should not be the case.
As per the screenshot, is it normal that anomaly_score is super large? I was using the following config.py (just change to cpu and meta_epochs = 2) to train the dummy_dataset. Have the same huge anomaly_score in the second epochs.