qubvel / segmentation_models.pytorch

Semantic segmentation models with 500+ pretrained convolutional and transformer-based backbones.
https://smp.readthedocs.io/
MIT License
9.14k stars 1.63k forks source link

UNET is not translation invariant #831

Closed TariqBlecher closed 7 months ago

TariqBlecher commented 8 months ago

In my basic understanding, CNNs are translation invariant, however, when I pass a zero-valued image through a UNET (efficient-b5 backbone, pre-trained on imagenet), there is a lot of structure in the output. It guess it has something to do with the imagenet pretraining but as far as I can see, pretraining normalisation and biases should be the same for all pixels in a given channel.

image


import numpy as np
import torch
import torch.nn as nn
import segmentation_models_pytorch as smp
from matplotlib import pyplot as plt

Umodel = smp.Unet(
    encoder_name='efficientnet-b5',      
    encoder_weights='imagenet',     
    in_channels=1,                 
    classes=1,                     
    activation=None,
    encoder_depth=3,
    decoder_channels=(128, 64, 32),
    decoder_attention_type=None
)
inputimage = torch.zeros(1,1,80,80)
Umodel.eval()
with torch.no_grad():
    output = Umodel(inputimage)

plt.imshow(output.squeeze())
plt.colorbar()
TariqBlecher commented 7 months ago

Ok I implemented my own UNET and also found structure when passing an array of zeros. Although the structure was more smaller scale.