openvinotoolkit / anomalib

An anomaly detection library comprising state-of-the-art algorithms and features such as experiment management, hyper-parameter optimization, and edge inference.
https://anomalib.readthedocs.io/en/latest/
Apache License 2.0
3.48k stars 629 forks source link

[Task]: Grayscale / Singlechannel Image Support #2175

Open EliasBu opened 2 weeks ago

EliasBu commented 2 weeks ago

What is the motivation for this task?

I want to use the anomalib to segment defects in huge (1500x1200 Pixel) images. The images are grayscale but the lib and some models always expect thr tensor with 3 channels. For small images you can ignore this, but for big images this increase the Performance and Gpu Usage (as well as memory size) a lot, for no benefit at all.

Describe the solution you'd like

Nativ grayscale image support in any way for the Folder (custom datasets) and the Models.

Additional context

No response

alexriedel1 commented 2 weeks ago

For feature similarity based methods like Patchcore or Padim this might work by changing the pre-trained models architecture. There are different ways to build a one-channel model from a three-channel model like aggerating the weights in the first convolution:

import torch
import timm

model = timm.create_model('resnet50', pretrained=True)
conv1_agg_weight = model.conv1.weight.sum(dim=1, keepdim=True)
model.conv1 = torch.nn.Conv2d(1,64, kernel_size=(7,7),stride=(2,2),padding=(3,3),bias=False)
model.conv1.weight.data = conv1_agg_weight

another way would be to just use one of the three input layers:

model = timm.create_model('resnet50', pretrained=True)
conv1_weight_channel = model.conv1.weight[:, 0, :, :] #first channel conv weight
model.conv1 = torch.nn.Conv2d(1,64, kernel_size=(7,7),stride=(2,2),padding=(3,3),bias=False)
model.conv1.weight.data = conv1_weight_channel

For reconstruction based methods it's a bit different hoewever. For some you might only have to change the Autoencoder architecture to accept one channel, for others like EfficientAD you would also have to alter the according pre-trained models architecture