Closed diPDew closed 9 years ago
Is it like the training process only involves two images (one is for content and the other is for style) and fine-tune the existing model to update the weights of the convolutional layers?
The weights of the convolutional layers are held fixed, so there is no fine-tuning of the model parameters. Instead, we optimize the input to the convnet (i.e. a 3xHxW tensor) to minimize the cost function given in Eq. (7) in the paper. This is done by backpropagation, using the gradient of the loss with respect to the input to the network.
how many iterations are needed in general to produce the final results
This depends on the convnet architecture, initialization, and optimization method. I find that with VGG-19, L-BFGS and random initialization, you would usually get good results by iteration 300-400 (and sometimes even sooner).
Hope that helps.
I see. Thanks a lot!
I'm a bit lost to understand how to train/fine-tune a deep model based on the proposed loss function in Eqn. (7) in the original paper. Is it like the training process only involves two images (one is for content and the other is for style) and fine-tune the existing model to update the weights of the convolutional layers? If so, how many iterations are needed in general to produce the final results (i.e., reconstructed images from different convolutional layers)?
Thanks very much in advance.