lgvaz / faststyle

Fast style transfer
Apache License 2.0
21 stars 0 forks source link

Investigate why u-net performs poorly with style transfer #5

Open lgvaz opened 4 years ago

lgvaz commented 4 years ago

Theoretically it should be way better than TransformerNet.

It performs really well for superres (which is almost the same thing). It's a more appropriate architecture image to image problems overall.

lgvaz commented 4 years ago

This paper does a great explanation of why U-net might fail in some cases. Quoting from the paper:

The U-net is ”lazy”. That is to say if the U-net find itself able to handle a problem in low-level layers, the high-level layers will not bother to learn anything. If we train a U-net to do a very simple work ”copying image” as in fig. 4, where the inputs and outputs are same, the loss value will drop to 0 immediately. Because the first layer of encoder discovers that it can simply transmit all features directly to the last layer of the decoder by skiping connection to minimize the loss. In this case, no matter how many times we train the U-net, the mid-level layers will not get any gradient.

lgvaz commented 4 years ago

A strategy for solving the issue can be:

lgvaz commented 4 years ago

The paper talks about "Guide decoders", although it's not deeply explained what they mean.

I think what I can try doing, is generating the image without the skip connections at each middle layer (basically repeating the next layers but without skip connections). This would generate a image for each middle layer, thus the gradient is always present.

lgvaz commented 4 years ago

First try to modify DynamicUnet failed miserably. Need to find a away to get the output of each UnetBlock with and without skip connections, and then use that to create multiple outputs from DynamicUnet