In my basic understanding, CNNs are translation invariant, however, when I pass a zero-valued image through a UNET (efficient-b5 backbone, pre-trained on imagenet), there is a lot of structure in the output. It guess it has something to do with the imagenet pretraining but as far as I can see, pretraining normalisation and biases should be the same for all pixels in a given channel.
import numpy as np
import torch
import torch.nn as nn
import segmentation_models_pytorch as smp
from matplotlib import pyplot as plt
Umodel = smp.Unet(
encoder_name='efficientnet-b5',
encoder_weights='imagenet',
in_channels=1,
classes=1,
activation=None,
encoder_depth=3,
decoder_channels=(128, 64, 32),
decoder_attention_type=None
)
inputimage = torch.zeros(1,1,80,80)
Umodel.eval()
with torch.no_grad():
output = Umodel(inputimage)
plt.imshow(output.squeeze())
plt.colorbar()
In my basic understanding, CNNs are translation invariant, however, when I pass a zero-valued image through a UNET (efficient-b5 backbone, pre-trained on imagenet), there is a lot of structure in the output. It guess it has something to do with the imagenet pretraining but as far as I can see, pretraining normalisation and biases should be the same for all pixels in a given channel.