ycjing / Neural-Style-Transfer-Papers

:pencil2: Neural Style Transfer: A Review
1.62k stars 270 forks source link

Is DIN using Mobilenet 1 v/s 3? #11

Closed nile649 closed 4 years ago

ycjing commented 4 years ago

Hi @nile649 ,

Thank you for your interest in our work. In our paper, DIN uses MobileNet V1. But it should also work with other architectures. Here is the supplementary material which includes the detailed architecture design: https://yongchengjing.com/pdf/supp-aaai-DIN.pdf

nile649 commented 4 years ago

what is the group in the architecture? original mobilnet doesn't have resnet module considering res means resnet module in the architecture! If so could you provide the source of pre-trained weights for the mobile net and model?

Thank you

ycjing commented 4 years ago

Hi @nile649 ,

Thank you for your interest. Group means the group number in convolutions. Please note that the original MobileNet is designed for the task of classification. By saying that the network is based on MobileNet v1 in style transfer or other tasks, it generally means that the network is built with depthwise separable convolutional module proposed in MobileNet V1.

For the code and model, due to the patent issue of Baidu Inc., I was required not to open source the code at the current stage. If you need them now, please email the last author (Shilei Wen) of the paper for details, who is responsible for this project at Baidu Inc.

nile649 commented 4 years ago

thanks a lot, it clarified the confusion surrounding the whole network,

Fig 4 is confusing since there is no mention of how the features of style image are extracted. Is the style Image also passed through the mobile net encoder, then passed to the weight/bias network + DIN, which are then concatenated with the features of content image for the respective layers?

ycjing commented 4 years ago

Hi @nile649 ,

Since I did not find any open sourced PyTorch code that implemented the dynamic convolution, I implemented it myself. My implementation consists of two steps:

First, a weight and bias network should be defined, which dynamically produces the weight and bias parameters in dynamic convolutions according to different inputs. The architectures of these two networks can be found in the supplementary material. In this step, the shape of the weight and bias should be paid especial attention to;

Second, feed the generated weight and bias parameters to the PyTorch function "torch.nn.functional.conv2d(t_temp, weight_generated, bias_generated, stride=1, padding=0, dilation=1, groups=num_groups)".

Please let me know if there is anything that is not clear. Thank you.

nile649 commented 4 years ago

Thanks for clarifying, I am amazed that you got the mobile net working. The reason is that as you already know you're the only team to get it done. I am more interested to see what exactly it learns. If you could just clarify the following last confusion, that would be great.

Fig 4 is confusing since there is no mention of how the features of style image are extracted. Is the style Image also passed through the mobile net encoder, then passed to the weight/bias network + DIN, which are then concatenated with the features of content image for the respective layers?

ycjing commented 4 years ago

Hi @nile649 ,

Thank you. Actually our work is not the first one that gets the MobileNet working in style transfer (e.g. https://heartbeat.fritz.ai/creating-a-17kb-style-transfer-model-with-layer-pruning-and-quantization-864d7cc53693), but indeed the first one that makes it work for the type of Arbitrary-Style-Per-Model stylization algorithm.

As for the question, the corresponding process is presented in Fig. 3, where the style features are extracted from a pre-trained VGG encoder. The VGG encoder we use here is exactly the same as AdaIN (https://github.com/xunhuang1995/AdaIN-style). Then the extracted style features are transformed into the weight and bias, which are further used to do convolutions with the content features.

nile649 commented 4 years ago

Got it. Thanks. I guess I understood the whole thing. It is then very similar to our paper but for different objectives https://arxiv.org/abs/1909.02165. Not using VGG increased the weight of the model, since I had to train the module from scratch. It only worked for a particular task.