Open lgvaz opened 4 years ago
This paper does a great explanation of why U-net might fail in some cases. Quoting from the paper:
The U-net is ”lazy”. That is to say if the U-net find itself able to handle a problem in low-level layers, the high-level layers will not bother to learn anything. If we train a U-net to do a very simple work ”copying image” as in fig. 4, where the inputs and outputs are same, the loss value will drop to 0 immediately. Because the first layer of encoder discovers that it can simply transmit all features directly to the last layer of the decoder by skiping connection to minimize the loss. In this case, no matter how many times we train the U-net, the mid-level layers will not get any gradient.
A strategy for solving the issue can be:
The paper talks about "Guide decoders", although it's not deeply explained what they mean.
I think what I can try doing, is generating the image without the skip connections at each middle layer (basically repeating the next layers but without skip connections). This would generate a image for each middle layer, thus the gradient is always present.
First try to modify DynamicUnet
failed miserably. Need to find a away to get the output of each UnetBlock
with and without skip connections, and then use that to create multiple outputs from DynamicUnet
Theoretically it should be way better than
TransformerNet
.It performs really well for superres (which is almost the same thing). It's a more appropriate architecture image to image problems overall.