Closed mrgloom closed 5 years ago
CoordinateChannel adds an input feature/s for the CNN to learn from. It is equivalent to giving a RGBXY "image" as input to the CNN to learn from.
In doing so, it makes no sense to add additional XY channels to subsequent layers, as they have already learned from and extracted any useful information from that input. Therefore there is no reason at add more CoordinateChannel layers after this first one.
Think of it this way, you are not resizing and concatinating your original RGB image into subsequent layers, so why do so with the coordinate channel information?
All that being said, I see no harm (other than a slight increase in parameters, and increased training time) to add more of these layers to subsequent parts of the network. My guess is that it may not help at all, but that I cannot say with certainty.
If we should use it just as first layer maybe it will be more efficient to not have a custom layer, but just input RGBXY to ordinary conv2d? i.e. it should be done on batch generator side.
It's interesting that looks like idea of adding x,y input planes was invented before coordconv paper, in "Automatic Portrait Segmentation for Image Stylization" authors use similar trick http://xiaoyongshen.me/webpage_portrait/index.html
Is there any intuition about usage of
CoordConv
layer? in paper architectures description is not detailed, i.e. should we replace all convolutions in some existing model (for example VGG16) withCoordConv
layer or just first convolution?