jcjohnson / neural-style

Torch implementation of neural style algorithm
MIT License
18.31k stars 2.7k forks source link

Non-issue: relative style layers weights #237

Open htoyryla opened 8 years ago

htoyryla commented 8 years ago

I have for a long time been interested in the idea of adjusting the weights of style layers separately. For instance if I see that some layer is having far larger or smalled than typical losses, it would be interesting to be able to adjust the weights.

I have now implemented this in https://github.com/htoyryla/neural-style by adding a new parameter -style_layer_weights, which accepts a comma-separated list of multipliers. The weight of each style layer will then be multiplier * style_weight. If the parameter is omitted, each layer will have multiplier = 1.

I assume that this is not for the average user, and that's why I didn't make this a pull request. But there might be other experiment-oriented users around, so I am letting them know that this exists.

htoyryla commented 8 years ago

Interesting. I tried your code with my usual first test: the neural-style defaults, and found out that it did not equalize layers at all. The strength for all but one layer dropped to 1. I did not run this longer that iter 100 as I am busy with other things for the rest of the week.

Oh yes. Your code is actually doing the opposite of equalization, and the weak layers strengths are pushed down. This is more like selecting the strong layers and suppressing the other, at least with my test case.

Your case might be different. This is why I like experimenting with code, not for finding any final truth or universal solution, but for working results, even if it might be for one case only.

(mod.strenghts printed first:)

1   1   
2   1   
3   1   
4   100 
5   1   
Iteration 75 / 1000 
  Content 1 loss: 858358.671875 
  Style 1 loss: 234.571411  
  Style 2 loss: 1713.968994 
  Style 3 loss: 346.774841  
  Style 4 loss: 243431.494141   
  Style 5 loss: 4.711990    
  Total loss: 1104090.193253    
htoyryla commented 8 years ago

Looking back, I made convis as a quick way to see what happens inside the model, not to be a tool for research or evaluating models. Maybe my expectations ran wild then when the images looked so nice, and especially for lower layers also informative. But again, it shows where the activations are, just need to keep in mind not to read too much into that.

htoyryla commented 8 years ago

I've had @bmaltais code running here with neural-style defaults but changing style weights. It appears to respond very well to these changes, clearly better than my original from yesterday. The images produced look ok to me, but I have not compared them with anything. In general, the idea of suppressing weak layers resonates with my practice of using fewer layers, usually 2 to 3. This code does the selection automatically.

BTW, another interesting question would be how to maintain appearance while increasing image size. The question has been raised here a couple of times, but I have not heard of a solution. To me, it feels that the solution does not lie in manipulating weights. Rather, it somehow derives from the change in the effective area covered by each kernel, perhaps similar to the change in the convis highlights when @mystcat increased the image size.

bmaltais commented 8 years ago

I am wondering if a similar approach for content might also help when combined with style weight balancing. It would also be possible to favor content layers with high loss (or possibly low loss in the case of content) using autotuned content weight... Might be worth testing.

In my case it won't be of much help since I tend to use a content weight of 0 along with an image init instead of noise... but for noise init it might make a difference.

Update:

original with only balanced weight for max loss vs style balanced for max loss and content balanced for min loss: https://twitter.com/netputing/status/743065822457794560

mystcat commented 8 years ago

Hannu,

I just got a little insight. It should be obvious for somebody but anyway here it is - neurons "like" features of source image only at particular scale. The scale they saw it when model was trained. For example, neuron 62 on layer 5_1 definitely reacts to eyes and mouth but it doesn't react that way when I feed the same image at smaller scale. Here are 3 examples: 224px: image2-conv5_1-62

448px: image2-conv5_1-62

896px: image2-conv5_1-62

P.S.: Also I changed convis in a way it shows real size of neuron activation size translated to fed image size.

mystcat commented 8 years ago

Now I'm not so sure anymore about neuron 5_1_62. There are lot of other activations trough the image aside from the hypothesis.

mystcat commented 8 years ago

One more test regarding activation of a neuron to the same image but at different scales. Neuron 5_2-426. It likes eyes mostly but lips also to less extent. I intentionally used an art picture so it should be clearer concept recognition rather than image patterns.

150px: test_face-relu5_2-426_150

224px: test_face-relu5_2-426

300px: test_face-relu5_2-426_300

350px: test_face-relu5_2-426_350

400px: test_face-relu5_2-426_400

500px: test_face-relu5_2-426_500

896px (activations are very weak at this scale): test_face-relu5_2-426_896

Apparently neuron activation becomes weaker when scale of the feature increases over particular point.

mystcat commented 8 years ago

This is the most probable cause of the problem with higher-resolution images that was discussed here before. Model just doesn't scale up because it stops working when feature size increases. Adding more memory and using high-end GPUs won't help. Higher resolution images will look different anyway.

htoyryla commented 8 years ago

bmaltais notifications@github.com kirjoitti 15.6.2016 kello 15.33:

I am wondering is a similar approach for content might also help when combined with style weight balancing. It would also be possible to favor content layers with high loss (or posibly low loss in the case of content) using autotuned content weight... Might be worth testing.

It should be doable in a similar fashion.

I didn’t originally implement layer specific weight for content as I most often use only one layer, but others have different approaches.

In my case it won't be of much help since I tend to use a content weight of 0 along with an image init instead of noise... but for noise init it might make a difference.

I was surprised to see content_weight 0 earlier but I get it… there are indeed many approaches and they can lead to different ways of using weights.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

htoyryla commented 8 years ago

mystcat notifications@github.com kirjoitti 15.6.2016 kello 17.45:

Hannu,

I just got a little insight. It should be obvious for somebody but anyway here it is - neurons "like" features of source image only at particular scale. The scale they saw it when model was trained. For example, neuron 62 on layer 5_1 definitely reacts to eyes and mouth but it doesn't react that way when I feed the same image at smaller scale.

Yes, I guess this is more or less the same effect that changes the neural-style results when image size is increased.

I would have thought though that VGG19 would have been trained using 224px images, while what you show would indicate larger training images.

htoyryla commented 8 years ago

Concerning facial features, this looks interesting https://www.technologyreview.com/s/601684/machine-vision-algorithm-learns-to-transform-hand-drawn-sketches-into-photorealistic-images/?utm_campaign=socialflow&utm_source=facebook&utm_medium=post . I have also been thinking that models which have actually been trained to generate images (such as in this one) might work better than those which have been trained for classification. Maybe.

Have to stop now, and might not be able to respond before Sunday or Monday.

bmaltais commented 8 years ago

@htoyryla Here is the code for the content weight. It is ratio based to progressively apply as a full on at 1st resulted in bad images:

  local function feval(x)
  num_calls = num_calls + 1
  net:forward(x)
  local grad = net:updateGradInput(x, dy)
  local loss = 0

  local ratio = num_calls / params.num_iterations

  if ((num_calls > 1) and (params.content_weight > 0)) then
    local maxcloss = 0
    local mincloss = 100000000
    for _, mod in ipairs(content_losses) do
      loss = loss + mod.loss
      if (maxcloss < mod.loss) then
        maxcloss = mod.loss
      end
      if (mincloss > mod.loss) then
        mincloss = mod.loss
      end
    end

    for i, mod in ipairs(content_losses) do
      mod.strength = params.content_weight * (1 - ((1-(mincloss / mod.loss)) * ratio))
    end
  end

  local slosses = torch.Tensor(#style_losses)
  for i, mod in ipairs(style_losses) do
    loss = loss + mod.loss
    slosses[i] = mod.loss
  end

  local maxloss = torch.max(slosses)

  for i, mod in ipairs(style_losses) do
      mod.strength = params.style_weight * mod.loss / maxloss
  end

  maybe_print(num_calls, loss)
  maybe_save(num_calls)
bmaltais commented 8 years ago

OK. Forget about ratio based. I played more with this and I think I nailed how to make use of the content/style balancing. Here is the "final" code:

-- Balancing option. Acceptable values are between 0..1 0 means no balancing and 1 mean full balancing
cmd:option('-style_max_bal', 0)
cmd:option('-style_min_bal', 0)
cmd:option('-content_max_bal', 0)
cmd:option('-content_min_bal', 0)

  local function feval(x)
  num_calls = num_calls + 1
  net:forward(x)
  local grad = net:updateGradInput(x, dy)
  local loss = 0

  local ratio = num_calls / params.num_iterations

  if ((num_calls > 1) and (params.content_weight > 0)) then
    local maxcloss = 0
    local mincloss = 100000000
    for _, mod in ipairs(content_losses) do
      loss = loss + mod.loss
      if (maxcloss < mod.loss) then
        maxcloss = mod.loss
      end
      if (mincloss > mod.loss) then
        mincloss = mod.loss
      end
    end

    local cminrate = params.content_min_bal
    local cmaxrate = params.content_max_bal

    for i, mod in ipairs(content_losses) do
      mod.strength = (mod.strength * (1 - cminrate)) + (params.content_weight * mincloss / mod.loss * cminrate)
      mod.strength = (mod.strength * (1 - cmaxrate)) + (params.content_weight * mod.loss / maxcloss * cmaxrate)
    end
  end

  local slosses = torch.Tensor(#style_losses)
  for i, mod in ipairs(style_losses) do
    loss = loss + mod.loss
    slosses[i] = mod.loss
  end

  local maxloss = torch.max(slosses)
  local minloss = torch.min(slosses)

  local sminrate = params.style_min_bal
  local smaxrate = params.style_max_bal

  for i, mod in ipairs(style_losses) do
    mod.strength = (mod.strength * (1 - sminrate)) + (params.style_weight * minloss / mod.loss * sminrate)
    mod.strength = (mod.strength * (1 - smaxrate)) + (params.style_weight * mod.loss / maxloss * smaxrate)
  end

  maybe_print(num_calls, loss)
  maybe_save(num_calls)

You essentially use options to apply balancing (min/max) to content/style layers. Default is 0 (no balancing) and can increase to 1 (100%) balanced. I have had interesting result playing with them one at a time... but possibly combining all 4 with different ratio might bring out different final results.

I noticed that applying a 100% balancing might result in really poor result... but a slightly less strength (like 0.95) might produce stunning results. Just be aware that using this full blast might not always be good.

A good place to start is to use -style_min_bal 0.75. It gives some good result when using -content_weight 0 -style_weight 100000

Have fun exploring.

mystcat commented 8 years ago

Interesting thing about that neuron - it does detect eyes and not necessarily human's.

screen shot 2016-06-16 at 1 12 21 pm
mystcat commented 8 years ago

It could be one eye as well:

screen shot 2016-06-16 at 1 20 33 pm