Closed kareldb closed 6 months ago
Hi, you mention faulty cases for training, so just a heads up from my side would be that these are ignored in unsupervised training.
About the edge on the pictures, if those are rare in training set, then this is probably causing trouble. So you either need to have more in the training set, or use some preprocessing to avoid this.
And the false positives, I'm not too sure about this. Could be many things, but in theory this shouldn't really happen unless the samples are vastly different. I'd say that make sure your train set is diverse enough if you have different kinds of wood, or maybe split by the wood type to avoid big fluctuations. For example Padim works by fitting a multivariate Gaussian on the features extracted from the wood, so in theory, if the image in test set is significantly different in terms of wood color, that would be seen as a great deviation if there wasn't many such cases in train set.
For unsupervised methods, this kinds of problem might be quite challenging since the defects seem to be in-distribution. Try using EfficientAD or PatchCore, those two models are quite powerful. Also pay attention to anomaly scores, not only anomaly maps, as those could be more reliable. If the colors of wood are significantly different, maybe also try some augmentations.
If nothing of the above works, you could give supervised training a shot, but it doesn't necessarily mean that the results will be better + we don't have any supervised methods in Anomalib so you'd sadly loose quite some useful features by using other methods.
Hope this helps.
Thanks for the heads up!
1) I have a dataset of about 100k background images, but the kernel crashes when trying to train that many images. 2) Will try to train a model for each type of wood, that way it won't see vastly different background samples. 3) Will try PatchCore & EfficientAD 4) Strated this problem with the supervised approach, but didn't work great since there aren't enough faulty samples
Cool, let us know how it goes. Rgearidng the 100k samples, that is really a lot, I think for padim and patchcore that's way too much. EfficientAD should work though, but one epoch would take ages. If there is more kinds wood, it's best to split I think so do tell how that will work.
Regarding the supervised approaches, I had supervised anomaly detection in mind, not just classification nets. There are some methods, although I don't know any one (at the moment) that would work that well. Maybe BGAD, but in my experience the masks are quite noisy.
How do you train your dataset with anomalib? I've tried for a long time to run code with my own dataset and it doesn't work, what parts of the source code have you changed?
Objective
I'm a thesis student at Ku Leuven (Belgium) trying to do anomaly detection for wood veneer.
System information
OS: Windows 11 Python version: 3.9 Anomalib version: 0.7.0
Problem
For training, I use 2500 background and 2500 faulty images. After that, interference is done on a close to real world test dataset with 20 000 background and 2000 faulty images.
The training of the model goes wel, but when I do interference on the test dataset, the performance is really low
While the performance on the training dataset (with the built in split methods) is:
Auroc: 0.905
When I inspect the faulty classified , I get the following insights:
1) The characteristic of the wood type is seen as an anomaly.
2) The corners of the wooden plate are always classified as faulty
This explains the low accuracy, while it still can detect the faults:
Code used
1) for training
2) for interference
Models used
Tried with almost every other supported model, no spectacular improvements.
Question
I've red a lot of papers about anomaly detection and think unsupervised learning is the way to go for this problem. Am I overseeing something? Is the Anomalib the right tool for this job?
Thanks in advance!