hnarayanan / artistic-style-transfer

Convolutional neural networks for artistic style transfer.
https://harishnarayanan.org/writing/artistic-style-transfer/
355 stars 79 forks source link

image resize for general cases #3

Closed kewellcjj closed 6 years ago

kewellcjj commented 7 years ago

Hi Harish,

There could be an error in the 6th notebook when you tried to resize the image using resize((height, width)). Turns out it should be resize((width, height)) as I was trying on some more general rectangular picture. The example in your notebook works because height=width.

BTW, what's your laptop configuration? It took my 2-year old laptop to run each iteration much longer......

Thanks for your excellent illustration of the whole process.

J

hnarayanan commented 7 years ago

Thank you for pointing it out to me!

And my laptop is a MacBook Pro from 2013, but I might have run that sheet on a machine with a k80 GPU (even though the comment was written when it was first run on a laptop).

hnarayanan commented 7 years ago

Hi! I spent some time fiddling with this today, and while I agree there is a problem with the height/width being replaced (for the image resizing), I am not sure how you're even getting it to work with width != height, as the input to the VGG network takes a square image.

If you can show me some sample code of how you're working with your images, I will fix the ordering of the height/width parameters.

AdityaSoni19031997 commented 6 years ago

Why not change the VGG Model itself?

hnarayanan commented 6 years ago

Because that's beyond the scope of this explanatory project. This repo is just an accompaniment to a detailed blog post on this stuff.

kewellcjj commented 6 years ago

@hnarayanan I'm not familar with VGG, but I feel CNN should work for general rectangular image by the way how it works? Have you run your code with width!=height, did you receive any error? Below is the code I used (only changes are width and height, and maybe some weights):

height = 200 width = 300

content_image_path = 'images/jj.jpg' content_image = Image.open(content_image_path) content_image = content_image.resize((width, height)) style_image_path = 'images/styles/gothic.jpg' style_image = Image.open(style_image_path) style_image = style_image.resize((width, height))

content_array = np.asarray(content_image, dtype='float32') content_array = np.expand_dims(content_array, axis=0) print(content_array.shape)

style_array = np.asarray(style_image, dtype='float32') style_array = np.expand_dims(style_array, axis=0) print(style_array.shape)

content_array[:, :, :, 0] -= 103.939 content_array[:, :, :, 1] -= 116.779 content_array[:, :, :, 2] -= 123.68 content_array = content_array[:, :, :, ::-1]

style_array[:, :, :, 0] -= 103.939 style_array[:, :, :, 1] -= 116.779 style_array[:, :, :, 2] -= 123.68 style_array = style_array[:, :, :, ::-1]

content_image = backend.variable(content_array) style_image = backend.variable(style_array) combination_image = backend.placeholder((1, height, width, 3))

input_tensor = backend.concatenate([content_image, style_image, combination_image], axis=0)

model = VGG16(input_tensor=input_tensor, weights='imagenet', include_top=False)

layers = dict([(layer.name, layer.output) for layer in model.layers])

content_weight = 0.025 style_weight = 1 total_variation_weight = 1.0

loss = backend.variable(0.)

def content_loss(content, combination): return backend.sum(backend.square(combination - content))

layer_features = layers['block2_conv2'] content_image_features = layer_features[0, :, :, :] combination_features = layer_features[2, :, :, :]

loss += content_weight * content_loss(content_image_features, combination_features)

def gram_matrix(x): features = backend.batch_flatten(backend.permute_dimensions(x, (2, 0, 1))) gram = backend.dot(features, backend.transpose(features)) return gram

def style_loss(style, combination): S = gram_matrix(style) C = gram_matrix(combination) channels = 3 size = height width return backend.sum(backend.square(S - C)) / (4. (channels * 2) (size ** 2))

feature_layers = [ 'block1_conv2', 'block2_conv2', 'block3_conv3', 'block4_conv3', 'block5_conv3'] for layer_name in feature_layers: layer_features = layers[layer_name] style_features = layer_features[1, :, :, :] combination_features = layer_features[2, :, :, :] sl = style_loss(style_features, combination_features) loss += (style_weight / len(feature_layers)) * sl

def total_variation_loss(x): a = backend.square(x[:, :height-1, :width-1, :] - x[:, 1:, :width-1, :]) b = backend.square(x[:, :height-1, :width-1, :] - x[:, :height-1, 1:, :]) return backend.sum(backend.pow(a + b, 1.25))

loss += total_variation_weight * total_variation_loss(combination_image)

grads = backend.gradients(loss, combination_image)

outputs = [loss] outputs += grads f_outputs = backend.function([combination_image], outputs)

def eval_loss_and_grads(x): x = x.reshape((1, height, width, 3)) outs = f_outputs([x]) loss_value = outs[0] grad_values = outs[1].flatten().astype('float64') return loss_value, grad_values

class Evaluator(object):

def __init__(self):
    self.loss_value = None
    self.grads_values = None

def loss(self, x):
    assert self.loss_value is None
    loss_value, grad_values = eval_loss_and_grads(x)
    self.loss_value = loss_value
    self.grad_values = grad_values
    return self.loss_value

def grads(self, x):
    assert self.loss_value is not None
    grad_values = np.copy(self.grad_values)
    self.loss_value = None
    self.grad_values = None
    return grad_values

evaluator = Evaluator()

x = np.random.uniform(0, 255, (1, height, width, 3)) - 128.

iterations = 10

for i in range(iterations):

print('Start of iteration', i)
start_time = time.time()
x, min_val, info = fmin_l_bfgs_b(evaluator.loss, x.flatten(),
                                 fprime=evaluator.grads, maxfun=20)
print('Current loss value:', min_val)
end_time = time.time()
print('Iteration %d completed in %ds' % (i, end_time - start_time))
hnarayanan commented 6 years ago

Does this mean your code works for you? I am quite certain the code as is in the repository (notebook 6) is only valid when width=height. (As in you'll even get an error if you try to give it a non-square input.)

kewellcjj commented 6 years ago

Mine has no error. As I mentioned in my first post, not only you need set different values for height and weight, when resize the image you should do resize((width, height)), instead of resize((height, width)) in your notebook. There are several places that you need to make sure the order is correct.

hnarayanan commented 6 years ago

Great, then I will pay more careful attention to the sizes of the objects flowing through.

hnarayanan commented 6 years ago

Thank you for reporting this. I first fixed it in some slightly more complex way, then realised it was simply me misunderstanding the API of image.resize() in PIL. Changing the order of width and height in the initial input image resize fixed the code.