marco-rudolph / differnet

This is the official repository to the WACV 2021 paper "Same Same But DifferNet: Semi-Supervised Defect Detection with Normalizing Flows" by Marco Rudolph, Bastian Wandt and Bodo Rosenhahn.
215 stars 67 forks source link

Is it normal that anomaly_score is super large #3

Closed lihyin closed 3 years ago

lihyin commented 4 years ago

As per the screenshot, is it normal that anomaly_score is super large? I was using the following config.py (just change to cpu and meta_epochs = 2) to train the dummy_dataset. Have the same huge anomaly_score in the second epochs.

Capture2

'''This file configures the training procedure because handling arguments in every single function is so exhaustive for
research purposes. Don't try this code if you are a software engineer.'''

# device settings
device = 'cpu' #'cuda' # or 'cpu'
import torch
torch.cuda.set_device(0)

# data settings
dataset_path = "dummy_dataset"
class_name = "dummy_class"
modelname = "dummy_test"

img_size = (448, 448)
img_dims = [3] + list(img_size)
add_img_noise = 0.01

# transformation settings
transf_rotations = True
transf_brightness = 0.0
transf_contrast = 0.0
transf_saturation = 0.0
norm_mean, norm_std = [0.485, 0.456, 0.406], [0.229, 0.224, 0.225]

# network hyperparameters
n_scales = 3 # number of scales at which features are extracted, img_size is the highest - others are //2, //4,...
clamp_alpha = 3 # see paper equation 2 for explanation
n_coupling_blocks = 8
fc_internal = 2048 # number of neurons in hidden layers of s-t-networks
dropout = 0.0 # dropout in s-t-networks
lr_init = 2e-4
n_feat = 256 * n_scales # do not change except you change the feature extractor

# dataloader parameters
n_transforms = 4 # number of transformations per sample in training
n_transforms_test = 64 # number of transformations per sample in testing
batch_size = 24 # actual batch size is this value multiplied by n_transforms(_test)
batch_size_test = batch_size * n_transforms // n_transforms_test

# total epochs = meta_epochs * sub_epochs
# evaluation after <sub_epochs> epochs
meta_epochs = 2
sub_epochs = 8

# output settings
verbose = True
grad_map_viz = True
hide_tqdm_bar = True
save_model = True
marco-rudolph commented 4 years ago

Yes, that could happen. Due to the exponential function some anomalies are mapped to extremly large z values. Often the mean score is dominated by a few extremely large scores. Another effect is that the model is not very stable in predicting the score for anomalies - as far as I observed the scores of normal samples are quite stable. To be honest, I did not made too much experiments on the dummy dataset, it should only be a toy example.

jinzishuai commented 4 years ago

@marco-rudolph I have a related question. I also observed very large values of both anomaly_score and the gradients.

In the paper you mentioned there is a threshold theta chosen to decide whether the input has anomaly or not. So does this mean the choice of theta is quite arbitrary and could vary for different inputs? Would it make sense to use some kind of activation function like softmax to ensure the anomaly_score in the range of (0,1)? This way, may we generally pick a number like 0.5 to be the threhold theta?

Similarly, I have problems when the gradients are very large and thus it is very hard to decide where the detects are in a general way since the there is no maxium value. Would a softmax like activation function be good idea here too? I feel it will have several benefits. One is to have a fix value range in the gradient map of [0,1] (a problem discussed in #2). Also, when I overlay the gradient map on the original image, I could potentially use the gradient values as the alpha parameter to set the transparency which would help identify where the threshold is.

marco-rudolph commented 4 years ago

I would say the choice of theta could vary for different datasets. One could have a validation set of non-anomalies to estimate which threshold should be chosen for a specific target false positive rate. Surely, it would not be bad to have a score between 0 and 1 - but in my opinion it does not really matter if you set it to 0.5 after softmax or 0 before softmax. But you may be right that it feels more comfortable and familiar. Feel free to add an option which applies softmax on the score and the gradients.

marco-rudolph commented 4 years ago

Sorry, I somehow mixed up softmax and sigmoid in my head... Read my last post as if softmax would be sigmoid. The problem of applying softmax on anomaly scores is that the (unknown) ratio of anomalies and the number of scores would have an impact on the softmax scores what should not be the case.