rekkit / style-bank-tensorflow

A tensorflow implementation of the Style Bank style transfer neural network.
2 stars 0 forks source link

Code detail question about vgg16_weights.npz #1

Open VictoryLN opened 5 years ago

VictoryLN commented 5 years ago

In vgg16.py, there is one line that I don't really understand. When assigning the initial value to vgg16 Variables, Your code is: def load_weights(self, weights_path, session=None): if self.session is None: self.session = session if self.session is None: raise ValueError( "The session can not be None if you want to load weights." ) self.weights = np.load(weights_path) keys = sorted(self.weights.keys()) for i, k in enumerate(keys): if i == len(self.params): break print(i, k, np.shape(self.weights[k])) self.session.run( self.params[i].assign( self.weights[k][:, :, ::-1, :] if (k.endswith("W") and i == 0) else self.weights[k] ) ) while I find others write:(http://www.cs.toronto.edu/~frossard/post/vgg16/) def load_weights(self, weight_file, sess): weights = np.load(weight_file) keys = sorted(weights.keys()) for i, k in enumerate(keys): print i, k, np.shape(weights[k]) sess.run(self.parameters[i].assign(weights[k])) Can you explain when and why initial value is weights[k][:,:,::-1,:]?Is this the problem caused by different .npy files? BTW, I find that encoder_gradients should be scaled according to the bank_kernel_gradients and lamda in the paper, where as you choose to clip the bank_gradients and only optimize the bank variables during the stylize_op training. If this works well, is that means auto-encoder could be training alone? thx

rekkit commented 5 years ago
  1. There are two conventions when loading images. RGB and BGR. The VGG16 model uses the BGR format, while the convention that I like to use is RGB. Hence, if I didn't perform the inversion of the third dimension (weights[k][:,:,::-1,:]) I would be multiplying:

    • the red layer of the image with the weights trained for the blue layer
    • the green layer of the image with the weights trained for the green layer
    • the blue layer of the image with the weights trained for the red layer.
  2. You are correct regarding the second point. I simply never got around to doing it exactly the way they did it in the paper. Feel free to create a pull request.

Let me know if anything is unclear.

VictoryLN commented 5 years ago

Inversion of the channel won't influence the output of the layer, so we can only inverse first layer weight and keep other layer weight origin format. Am I right? But in vgg16.forward_conv_output() why the input image uses the BGR format, you have already perform the inversion of weights channel? Second, I find that your style_loss output is smaller than me. When I compare your code with mine. I got a small difference: In network.py.StyleBank.initialize_style_loss, *self.style_loss = self.style_loss self.style_loss_param / len(styled_content_outputs) is in the for loop while my code put it out the for loop**. This means your code reduces the outputs of the former layer for many times, not once. Do you do it on purpose?

VictoryLN commented 5 years ago

Can you share me some tips on how to find good hyperparameters(alpha, beta), especially find it's magnitudes? You know,training a model takes a really long time.During the training process,do you test and output some images? I find my output image is full of small white circles. I don't know whether it's normal or not. Would you like to share some training output images? It would be better if you can send me some images(I don't know how to upload image in issues). My email: ywb.vic.ln@gmail.com Looking forward to your reply. Thx