openvinotoolkit / anomalib

An anomaly detection library comprising state-of-the-art algorithms and features such as experiment management, hyper-parameter optimization, and edge inference.
https://anomalib.readthedocs.io/en/latest/
Apache License 2.0
3.4k stars 614 forks source link

[Bug]: EfficientAd is slower than other models in anomalib #2150

Open haimat opened 1 week ago

haimat commented 1 week ago

Describe the bug

I had the impression that the EfficientAd model would be among the fastest in anomalib in terms of prediction times. To verify that I have trained three models, Padim, Fastflow, and EfficientAd, all with the same training data and an image dimension of 512x512 pixels. Then I have written a small script that loads these models, warms up the GPU, and then runs prediction on 100 images. I measure only the model forward time, no image loading or any pre- or post-processing.

With the models exported to ONNX I get these results (avg. model forwards times on 100 images):

So in other words: The EfficientAd model is the slowest from these three, and Padim the fastest - I thought it would be the other way round. Am I missing something, or is this a bug in anomalib?

Dataset

Other (please specify in the text field below)

Model

Other (please specify in the field below)

Steps to reproduce the behavior

I trained three models on the same dataset, then predict 100 images with each of them and measure the avg. model forward / inferencing time, without pre- or post-processing.

OS information

OS information:

Expected behavior

I would expect the EfficientAd net to be considerable faster than the other models.

Screenshots

No response

Pip/GitHub

pip

What version/branch did you use?

No response

Configuration YAML

-

Logs

-

Code of Conduct

alexriedel1 commented 1 week ago

Hi, can you show how you measure the timing? In plain pytorch on 256x256 images I have the following speed measurements on a GTX 1660 Ti: Padim: 6.4ms EfficientAD S: 64.1ms Fastflow: 33ms

When measuring this implementation of EfficientAD which claims to reach the paper result timing stats, I get the same speed of 64ms per image on my GPU. This makes me think that the speed of EfficientAD in anomalib isn't slower as it should be.

The authors of EfficientAD state that For each method, we remove unnecessary parts for the timing, such as the computation of losses during inference, and use float16 precision for all networks. Switching from float32 to float16 for the inference of EfficientAD does not change the anomaly detection results for the 32 anomaly detection scenarios evaluated in this paper. In latency-critical applications, padding in the PDN architecture of EfficientAD can be disabled. This speeds up the forward pass of the PDN architecture by 80 µs without impairing the detection of anomalies. We time EfficientAD without padding and therefore report the anomaly detection results for this setting in the experimental results of this paper

So you should be sure to set padding=False and use half-precision. Especially half-precision matters for some kinds of GPUs

alexriedel1 commented 1 week ago

I was curious and made some more experiments. half precision really matters for example for a T4 GPU. anomalib EfficientAD refers to the anomalib implementation, nelson refers to this implementation


256 x 256 image size
Anomalib EfficientAD S full precision 24ms
Anomalib EfficientAD S half precision 8.9ms
nelson EfficientAD S full precision 21.5ms
nelson EfficientAD S half precision 7.4ms

Anomalib Fastflow half precision Resnet18 25.23ms
Anomalib Fastflow full precision Resnet18 23.37ms

512 x 512 image size
Anomalib EfficientAD S full precision 161ms
Anomalib EfficientAD S half precision 30ms
nelson EfficientAD S full precision 153ms
nelson EfficientAD S half precision 27ms

Anomalib Fastflow half precision Resnet18 26.1ms
Anomalib Fastflow full precision Resnet18 24.9ms

Anomalib Fastflow full precision Resnet50 108ms
Anomalib Fastflow half precision Resnet50 52ms

What I assume from these results (and isn't big news): Half precision matters especially for convolution intensive models. Image size matters. The choice of GPU matters. The EfficientAD authors might not have made a fair comparison between the models because I have the feeling they didn't use half precision for all the others they compare their inference speed with.

haimat commented 1 week ago

@alexriedel1 Thanks for your response. I will try to reproduce your experiments and get back here to you soon!

haimat commented 1 day ago

@alexriedel1 When I look at your testing results, it is clear that regardless of the image size even with half precision the EfficientAd model is slower than the full precision Fastflow model. That is quite a surprise ...

I have exported my models to ONNX and then converted to TensorRT on Nvidia. For the latter I have enabled half precision.

How have you turned on or off the half precision mode? Also, how can you define the Resnet type for the Fastlow model?