Open EliasBu opened 2 weeks ago
For feature similarity based methods like Patchcore or Padim this might work by changing the pre-trained models architecture. There are different ways to build a one-channel model from a three-channel model like aggerating the weights in the first convolution:
import torch
import timm
model = timm.create_model('resnet50', pretrained=True)
conv1_agg_weight = model.conv1.weight.sum(dim=1, keepdim=True)
model.conv1 = torch.nn.Conv2d(1,64, kernel_size=(7,7),stride=(2,2),padding=(3,3),bias=False)
model.conv1.weight.data = conv1_agg_weight
another way would be to just use one of the three input layers:
model = timm.create_model('resnet50', pretrained=True)
conv1_weight_channel = model.conv1.weight[:, 0, :, :] #first channel conv weight
model.conv1 = torch.nn.Conv2d(1,64, kernel_size=(7,7),stride=(2,2),padding=(3,3),bias=False)
model.conv1.weight.data = conv1_weight_channel
For reconstruction based methods it's a bit different hoewever. For some you might only have to change the Autoencoder architecture to accept one channel, for others like EfficientAD you would also have to alter the according pre-trained models architecture
What is the motivation for this task?
I want to use the anomalib to segment defects in huge (1500x1200 Pixel) images. The images are grayscale but the lib and some models always expect thr tensor with 3 channels. For small images you can ignore this, but for big images this increase the Performance and Gpu Usage (as well as memory size) a lot, for no benefit at all.
Describe the solution you'd like
Nativ grayscale image support in any way for the Folder (custom datasets) and the Models.
Additional context
No response