Open rverschuren opened 1 year ago
First experiment
Name: 3knaoywc Ref: https://wandb.ai/stpm-unet/STPM/runs/2yvg43sg
We changed the code such that no augmentation are performed:
self.transforms = nn.Sequential(
#RandomBoxBlur(kernel_size=(2,2), border_type='reflect', p=0.2),
#RandomBoxBlur(kernel_size=(3,3), border_type='reflect', p=0.2),
#RandomBoxBlur(kernel_size=(5,5), border_type='reflect', p=0.2),
#RandomBoxBlur(kernel_size=(7,7), border_type='reflect', p=0.2),
#RandomBoxBlur(kernel_size=(5,5), border_type='reflect', p=0.2),
#RandomBoxBlur(kernel_size=(15,15), border_type='reflect', p=0.1),
#RandomAffine(degrees=45.0, scale=(1,2), padding_mode=2, p=.75),
#ColorJiggle(0.1, 0.1, 0.1, 0.1, p=1.),
)
Run command: python train_resnet.py --phase train --category carpet --num_epochs 50
Now lets try to find the optimal threshold by looking at the distributions of false positive and false negative samples.
From the default threshold of 0.00097
For false negative:
Mean: 0.0007572558047292703
Median: 0.0008210714216521516
Max: 0.000880734446970702
Min: 0.0004892590844086252
amount: 5
For false positive:
Mean: 0.0012743603461189026
Median: 0.0012743603461189026
Max: 0.0012743603461189026
Min: 0.0012743603461189026
amount: 1
After optimizing the threshold (0.00082): we obtain the following confusion matrix:
Doing the same for the leather dataset:
threshold found: 0.021
We clearly observe lower performances than for the carpet. Which makes sense as we trained the model for carpet and we do not use augmentations.
Observe the difference of auc:
Let's try now with ResNet34 as backbone for the teacher and Resnet18 for the student.
Training of the model: https://wandb.ai/stpm-unet/STPM/runs/2mutqklv
We compared the image-level and pixel-level AUC for ResNet34 and ResNet18 after finding a suitable threshold. Our results showed that ResNet18 performed better than ResNet34. This suggests that the smaller network architecture of ResNet18 is more effective for this task. For this experiment, it is important to note that the model was trained using the carpet dataset and then tested on the unseen leather dataset. This allows us to evaluate the model's ability to generalize to a new anomaly task.
When testing the model on the semantically similar anomaly task of carpet (trained on it), we found that there was no significant difference in performance between ResNet18 and ResNet34. This suggests that for this specific task and without augmentations, both models are equally effective when the test data is semantically similar to the training data.
Additionally, there is a clear advantage of using ResNet18 as it is a smaller and lighter model, making it more efficient and easier to deploy in practical applications.
Performing some experiments on the model without using augmentations. I'll be keeping track of the steps of my experiments in this issue.