UNET is not translation invariant

In my basic understanding, CNNs are translation invariant, however, when I pass a zero-valued image through a UNET (efficient-b5 backbone, pre-trained on imagenet), there is a lot of structure in the output. It guess it has something to do with the imagenet pretraining but as far as I can see, pretraining normalisation and biases should be the same for all pixels in a given channel.


import numpy as np
import torch
import torch.nn as nn
import segmentation_models_pytorch as smp
from matplotlib import pyplot as plt

Umodel = smp.Unet(
    encoder_name='efficientnet-b5',      
    encoder_weights='imagenet',     
    in_channels=1,                 
    classes=1,                     
    activation=None,
    encoder_depth=3,
    decoder_channels=(128, 64, 32),
    decoder_attention_type=None
)
inputimage = torch.zeros(1,1,80,80)
Umodel.eval()
with torch.no_grad():
    output = Umodel(inputimage)

plt.imshow(output.squeeze())
plt.colorbar()

qubvel-org / segmentation_models.pytorch

UNET is not translation invariant #831