naoto0804 / pytorch-AdaIN

Unofficial pytorch implementation of 'Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization' [Huang+, ICCV2017]
MIT License
1.08k stars 208 forks source link

testing performance #10

Closed InstantWindy closed 6 years ago

InstantWindy commented 6 years ago

Hello,Did you train according to your train.py file? The total number of iterations is 100,000.I found that you trained according to your train.py file, and the final test effect failed to reach the result of the paper. How was your test effect? Thank!

naoto0804 commented 6 years ago

The original implementation used 160000 iterations. Thank you for pointing out. I agree that the longer period of training will produce better results. https://github.com/xunhuang1995/AdaIN-style/blob/master/train.lua#L38

I accidentally deleted the checkpoint of the model so I cannot show the results, I'm sorry. How bad was the result compared to the paper? Could you illustrate some bad results?

InstantWindy commented 6 years ago

Does vgg19 not train during training? I had a total of 200000 iterations, but this result is still not good.I set alpha=0.75,get the test result: 2007_000032_stylized_flower_of_life

InstantWindy commented 6 years ago

The style is flower_of_life,just one style.I set the initial learing rate is 1e-4,batch_size=12,adopting lr decay:

def adjust_learning_rate(optimizer, iteration_count): """Imitating the original implementation""" lr = init_lr / (1.0 + lr_decay * iteration_count) for param_group in optimizer.param_groups: param_group['lr'] = lr

I don't know why ,could you help me ? Thanks!

naoto0804 commented 6 years ago

Just in case, let me ask one question first: in the training phase, did you follow the original paper to use MS-COCO for the content images and WikiArt for the style images?

[Vgg19 fixed?] According to the original paper,

the encoder f is fixed to the first few layers (up to relu4_1) of a pre-trained VGG-19.

The figure you attached seems curious, however, I suspect that color and style of the images you used are originally very close to each other, which results in the unsurprising result (although I cannot know how the original content image you used look like). Sorry to bother you again, could you test the combination of images in Fig.4, Fig.7 or Fig.8 for both the style and the content??

InstantWindy commented 6 years ago

I use VOC2012 as the content image. Do you mean that my training content pictures and style pictures are very close to result in poor training? But must this training content use the COCO dataset? However, it is not clearly stated in the paper whether VGG19 is trained during training. I think your code says that vgg19 is not trained because you set requirements_grad=False.

naoto0804 commented 6 years ago

[Train] It is natural to follow exactly the same settings if you claim that the final test effect failed to reach the result of the paper, isn't it?? Although I also think the choice of the dataset does not matter, but it definitely is possible. I'd like to consider the effect of the choice of the dataset and the possible faults in my codes separately.

[Test] No, I mean, the image used for your testing is close to each other. It's difficult to make an induction from only a single example. If possible please test the other combinations used in the original paper to claim that the training went wrong. https://github.com/xunhuang1995/AdaIN-style#content-style-trade-off

[VGG model] As I read the original codes, it seems that the loss is not back-propagated to the encoder. https://github.com/xunhuang1995/AdaIN-style/blob/facb6b619d51564fd5040ba71d15c980a889dddc/train.lua#L270-L271

InstantWindy commented 6 years ago

Ok.Thank you very much! When you trained ,did you get the paper result?

naoto0804 commented 6 years ago

Finally, I recall that using higher weight for style loss such as --style_weight 10.0 produced perceptually better results. Here I show the results using the combination shown here: https://github.com/xunhuang1995/AdaIN-style#examples

I have pushed everything into the master branch including the default style_weight for 10.0. Thanks for pointing out.

results

InstantWindy commented 6 years ago

Did you just change this style-weight? So is this style-weight value too small, leading to poor results?Thank you very much!

naoto0804 commented 6 years ago

Yes, the smaller style-weight made the decoder focus more on the content. In addition, we have to provide a large number of images for training. I suspect that your results also suffer from insufficient data. The number of images in VOC is much smaller than MSCOCO (I used 80k images).

InstantWindy commented 6 years ago

Thank you very much! I think it is strange that the training decoder seems to have nothing to do with the content image, because I was using the weight of the decoder training you provided to test and found that the effect is similar to the paper. But the test data I use is VOC2012, and you use the MSCOCO dataset for training。

InstantWindy commented 6 years ago

Hello!Do you know which of the style transfer can replace the background of the content image with the background of the style image? Your test code does not implement Spatial control?

naoto0804 commented 6 years ago

Yes, I did not implement.

InstantWindy commented 6 years ago

Hello, are you a student? I am a graduate student. What is your research direction? I am doing image segmentation. I am a beginner. I think you are very helpful.Thanks!

naoto0804 commented 6 years ago

I'm a Ph. D student. Please see here for details! https://naoto0804.github.io/

InstantWindy commented 6 years ago

I think the contentFeatureBG dimension and targetFeature dimension are different from contentFeature dimension,but the code writes: targetFeature = targetFeature:viewAs(contentFeature),I don't know tim 20180519154009