Closed kewellcjj closed 6 years ago
Thank you for pointing it out to me!
And my laptop is a MacBook Pro from 2013, but I might have run that sheet on a machine with a k80 GPU (even though the comment was written when it was first run on a laptop).
Hi! I spent some time fiddling with this today, and while I agree there is a problem with the height/width being replaced (for the image resizing), I am not sure how you're even getting it to work with width != height, as the input to the VGG network takes a square image.
If you can show me some sample code of how you're working with your images, I will fix the ordering of the height/width parameters.
Why not change the VGG Model itself?
Because that's beyond the scope of this explanatory project. This repo is just an accompaniment to a detailed blog post on this stuff.
@hnarayanan I'm not familar with VGG, but I feel CNN should work for general rectangular image by the way how it works? Have you run your code with width!=height, did you receive any error? Below is the code I used (only changes are width and height, and maybe some weights):
height = 200 width = 300
content_image_path = 'images/jj.jpg' content_image = Image.open(content_image_path) content_image = content_image.resize((width, height)) style_image_path = 'images/styles/gothic.jpg' style_image = Image.open(style_image_path) style_image = style_image.resize((width, height))
content_array = np.asarray(content_image, dtype='float32') content_array = np.expand_dims(content_array, axis=0) print(content_array.shape)
style_array = np.asarray(style_image, dtype='float32') style_array = np.expand_dims(style_array, axis=0) print(style_array.shape)
content_array[:, :, :, 0] -= 103.939 content_array[:, :, :, 1] -= 116.779 content_array[:, :, :, 2] -= 123.68 content_array = content_array[:, :, :, ::-1]
style_array[:, :, :, 0] -= 103.939 style_array[:, :, :, 1] -= 116.779 style_array[:, :, :, 2] -= 123.68 style_array = style_array[:, :, :, ::-1]
content_image = backend.variable(content_array) style_image = backend.variable(style_array) combination_image = backend.placeholder((1, height, width, 3))
input_tensor = backend.concatenate([content_image, style_image, combination_image], axis=0)
model = VGG16(input_tensor=input_tensor, weights='imagenet', include_top=False)
layers = dict([(layer.name, layer.output) for layer in model.layers])
content_weight = 0.025 style_weight = 1 total_variation_weight = 1.0
loss = backend.variable(0.)
def content_loss(content, combination): return backend.sum(backend.square(combination - content))
layer_features = layers['block2_conv2'] content_image_features = layer_features[0, :, :, :] combination_features = layer_features[2, :, :, :]
loss += content_weight * content_loss(content_image_features, combination_features)
def gram_matrix(x): features = backend.batch_flatten(backend.permute_dimensions(x, (2, 0, 1))) gram = backend.dot(features, backend.transpose(features)) return gram
def style_loss(style, combination): S = gram_matrix(style) C = gram_matrix(combination) channels = 3 size = height width return backend.sum(backend.square(S - C)) / (4. (channels * 2) (size ** 2))
feature_layers = [ 'block1_conv2', 'block2_conv2', 'block3_conv3', 'block4_conv3', 'block5_conv3'] for layer_name in feature_layers: layer_features = layers[layer_name] style_features = layer_features[1, :, :, :] combination_features = layer_features[2, :, :, :] sl = style_loss(style_features, combination_features) loss += (style_weight / len(feature_layers)) * sl
def total_variation_loss(x): a = backend.square(x[:, :height-1, :width-1, :] - x[:, 1:, :width-1, :]) b = backend.square(x[:, :height-1, :width-1, :] - x[:, :height-1, 1:, :]) return backend.sum(backend.pow(a + b, 1.25))
loss += total_variation_weight * total_variation_loss(combination_image)
grads = backend.gradients(loss, combination_image)
outputs = [loss] outputs += grads f_outputs = backend.function([combination_image], outputs)
def eval_loss_and_grads(x): x = x.reshape((1, height, width, 3)) outs = f_outputs([x]) loss_value = outs[0] grad_values = outs[1].flatten().astype('float64') return loss_value, grad_values
class Evaluator(object):
def __init__(self):
self.loss_value = None
self.grads_values = None
def loss(self, x):
assert self.loss_value is None
loss_value, grad_values = eval_loss_and_grads(x)
self.loss_value = loss_value
self.grad_values = grad_values
return self.loss_value
def grads(self, x):
assert self.loss_value is not None
grad_values = np.copy(self.grad_values)
self.loss_value = None
self.grad_values = None
return grad_values
evaluator = Evaluator()
x = np.random.uniform(0, 255, (1, height, width, 3)) - 128.
iterations = 10
for i in range(iterations):
print('Start of iteration', i)
start_time = time.time()
x, min_val, info = fmin_l_bfgs_b(evaluator.loss, x.flatten(),
fprime=evaluator.grads, maxfun=20)
print('Current loss value:', min_val)
end_time = time.time()
print('Iteration %d completed in %ds' % (i, end_time - start_time))
Does this mean your code works for you? I am quite certain the code as is in the repository (notebook 6) is only valid when width=height. (As in you'll even get an error if you try to give it a non-square input.)
Mine has no error. As I mentioned in my first post, not only you need set different values for height and weight, when resize the image you should do resize((width, height)), instead of resize((height, width)) in your notebook. There are several places that you need to make sure the order is correct.
Great, then I will pay more careful attention to the sizes of the objects flowing through.
Thank you for reporting this. I first fixed it in some slightly more complex way, then realised it was simply me misunderstanding the API of image.resize()
in PIL. Changing the order of width and height in the initial input image resize fixed the code.
Hi Harish,
There could be an error in the 6th notebook when you tried to resize the image using resize((height, width)). Turns out it should be resize((width, height)) as I was trying on some more general rectangular picture. The example in your notebook works because height=width.
BTW, what's your laptop configuration? It took my 2-year old laptop to run each iteration much longer......
Thanks for your excellent illustration of the whole process.
J