rverschuren commented 1 year ago

Performing some experiments on the model without using augmentations. I'll be keeping track of the steps of my experiments in this issue.

rverschuren commented 1 year ago

First experiment

Name: 3knaoywc Ref: https://wandb.ai/stpm-unet/STPM/runs/2yvg43sg

We changed the code such that no augmentation are performed:

 self.transforms = nn.Sequential(
            #RandomBoxBlur(kernel_size=(2,2), border_type='reflect', p=0.2),
            #RandomBoxBlur(kernel_size=(3,3), border_type='reflect', p=0.2),
            #RandomBoxBlur(kernel_size=(5,5), border_type='reflect', p=0.2),
            #RandomBoxBlur(kernel_size=(7,7), border_type='reflect', p=0.2),
            #RandomBoxBlur(kernel_size=(5,5), border_type='reflect', p=0.2),
            #RandomBoxBlur(kernel_size=(15,15), border_type='reflect', p=0.1),
            #RandomAffine(degrees=45.0, scale=(1,2), padding_mode=2, p=.75),
            #ColorJiggle(0.1, 0.1, 0.1, 0.1, p=1.),
        )

Run command: python train_resnet.py --phase train --category carpet --num_epochs 50

rverschuren commented 1 year ago

Now lets try to find the optimal threshold by looking at the distributions of false positive and false negative samples.

From the default threshold of 0.00097

For false negative:
    Mean: 0.0007572558047292703
    Median: 0.0008210714216521516
    Max: 0.000880734446970702
    Min: 0.0004892590844086252
    amount: 5
For false positive:
    Mean: 0.0012743603461189026
    Median: 0.0012743603461189026
    Max: 0.0012743603461189026
    Min: 0.0012743603461189026
    amount: 1

After optimizing the threshold (0.00082): we obtain the following confusion matrix:

rverschuren commented 1 year ago

Doing the same for the leather dataset:

threshold found: 0.021

We clearly observe lower performances than for the carpet. Which makes sense as we trained the model for carpet and we do not use augmentations.

rverschuren commented 1 year ago

Observe the difference of auc:

rverschuren commented 1 year ago

Let's try now with ResNet34 as backbone for the teacher and Resnet18 for the student.

Training of the model: https://wandb.ai/stpm-unet/STPM/runs/2mutqklv

Summary of experiment

We compared the image-level and pixel-level AUC for ResNet34 and ResNet18 after finding a suitable threshold. Our results showed that ResNet18 performed better than ResNet34. This suggests that the smaller network architecture of ResNet18 is more effective for this task. For this experiment, it is important to note that the model was trained using the carpet dataset and then tested on the unseen leather dataset. This allows us to evaluate the model's ability to generalize to a new anomaly task.

When testing the model on the semantically similar anomaly task of carpet (trained on it), we found that there was no significant difference in performance between ResNet18 and ResNet34. This suggests that for this specific task and without augmentations, both models are equally effective when the test data is semantically similar to the training data.

Additionally, there is a clear advantage of using ResNet18 as it is a smaller and lighter model, making it more efficient and easier to deploy in practical applications.

rverschuren / STPM-Augmented-for-industrial-anomaly-detection

[experiment] without augmentations #4

Summary of experiment