zhiyuanyou / UniAD

[NeurIPS 2022 Spotlight] A Unified Model for Multi-class Anomaly Detection
Apache License 2.0
250 stars 28 forks source link

anomaly localization results are close, but anomaly detection results are different. #6

Closed zhanjw closed 2 years ago

zhanjw commented 2 years ago

Using the configuration you have given

./experiments/MVTec-AD/connfig.yaml

the configuration does not contain

  metrics:
    auc:
      - name: mean

I added it and tried to reproduce it

mean AUROC seems a bit low, especially for screw

Are there any tips for training? or anything I should be aware of?

clsname mean pixel max std
capsule 0.83566 0.98597 0.913841 0.871959
bottle 0.988095 0.97981 1 1
toothbrush 0.888889 0.983198 0.936111 0.972222
screw 0.539045 0.987008 0.913302 0.947325
transistor 0.919167 0.982308 0.99875 0.99625
wood 0.961404 0.930861 0.985965 0.980702
tile 0.994949 0.919262 0.993506 0.997835
hazelnut 0.995357 0.980919 1 0.997857
leather 1 0.987544 1 1
pill 0.832242 0.960712 0.945717 0.875614
grid 0.951546 0.972235 0.988304 0.993317
metal_nut 0.882209 0.932279 0.995112 0.969697
zipper 0.985557 0.974935 0.979254 0.982668
cable 0.923913 0.974855 0.957271 0.958958
carpet 0.886437 0.983814 0.997994 0.998796
mean 0.905631 0.969047 0.973675 0.969547
zhiyuanyou commented 2 years ago

Hi, your results are right.

The mean, max, and std in the first line actually mean post-processing methods. That is to say, the anomaly localization result is an anomaly map with the shape of H x W. We need to convert this map to a scalar as the anomaly score for this whole image. For this convert, you have three options.

In our paper, we use max for MVTec-AD and use mean for CIFAR-10.

zhiyuanyou commented 2 years ago

Another question, why mean is so poor for screw? This is because: For screw, the area of fore-ground region is too small. Using mean for post-processing takes too many irrelevant back-ground regions into consideration. Thus mean is obviously not a good solution for screw.

zhiyuanyou commented 2 years ago

Therefore, your final results should be 96.9 for localization and 97.4 for detection, even better than our paper (We use 8 GPUs whose results are usually poorer than 1 or 2 GPUs).

Also, we will add the explanation of mean, max, std to README.

18894269590 commented 9 months ago

作者,你好,我们在MVTecAD上利用论文里面的设置,bs=64,backbone选为efficientnet_b4,然而当用mean_max_auc作为key_metric的时候,利用1000个epoch里面保存的ckpt_best.pth.tar,得到的性能指标如下,很多指标都达不到论文的指标,请问这是为什么呀,期盼您的回答。 image