Open htoyryla opened 8 years ago
Interesting. I tried your code with my usual first test: the neural-style defaults, and found out that it did not equalize layers at all. The strength for all but one layer dropped to 1. I did not run this longer that iter 100 as I am busy with other things for the rest of the week.
Oh yes. Your code is actually doing the opposite of equalization, and the weak layers strengths are pushed down. This is more like selecting the strong layers and suppressing the other, at least with my test case.
Your case might be different. This is why I like experimenting with code, not for finding any final truth or universal solution, but for working results, even if it might be for one case only.
(mod.strenghts printed first:)
1 1
2 1
3 1
4 100
5 1
Iteration 75 / 1000
Content 1 loss: 858358.671875
Style 1 loss: 234.571411
Style 2 loss: 1713.968994
Style 3 loss: 346.774841
Style 4 loss: 243431.494141
Style 5 loss: 4.711990
Total loss: 1104090.193253
Looking back, I made convis as a quick way to see what happens inside the model, not to be a tool for research or evaluating models. Maybe my expectations ran wild then when the images looked so nice, and especially for lower layers also informative. But again, it shows where the activations are, just need to keep in mind not to read too much into that.
I've had @bmaltais code running here with neural-style defaults but changing style weights. It appears to respond very well to these changes, clearly better than my original from yesterday. The images produced look ok to me, but I have not compared them with anything. In general, the idea of suppressing weak layers resonates with my practice of using fewer layers, usually 2 to 3. This code does the selection automatically.
BTW, another interesting question would be how to maintain appearance while increasing image size. The question has been raised here a couple of times, but I have not heard of a solution. To me, it feels that the solution does not lie in manipulating weights. Rather, it somehow derives from the change in the effective area covered by each kernel, perhaps similar to the change in the convis highlights when @mystcat increased the image size.
I am wondering if a similar approach for content might also help when combined with style weight balancing. It would also be possible to favor content layers with high loss (or possibly low loss in the case of content) using autotuned content weight... Might be worth testing.
In my case it won't be of much help since I tend to use a content weight of 0 along with an image init instead of noise... but for noise init it might make a difference.
Update:
original with only balanced weight for max loss vs style balanced for max loss and content balanced for min loss: https://twitter.com/netputing/status/743065822457794560
Hannu,
I just got a little insight. It should be obvious for somebody but anyway here it is - neurons "like" features of source image only at particular scale. The scale they saw it when model was trained. For example, neuron 62 on layer 5_1 definitely reacts to eyes and mouth but it doesn't react that way when I feed the same image at smaller scale. Here are 3 examples: 224px:
448px:
896px:
P.S.: Also I changed convis in a way it shows real size of neuron activation size translated to fed image size.
Now I'm not so sure anymore about neuron 5_1_62. There are lot of other activations trough the image aside from the hypothesis.
One more test regarding activation of a neuron to the same image but at different scales. Neuron 5_2-426. It likes eyes mostly but lips also to less extent. I intentionally used an art picture so it should be clearer concept recognition rather than image patterns.
150px:
224px:
300px:
350px:
400px:
500px:
896px (activations are very weak at this scale):
Apparently neuron activation becomes weaker when scale of the feature increases over particular point.
This is the most probable cause of the problem with higher-resolution images that was discussed here before. Model just doesn't scale up because it stops working when feature size increases. Adding more memory and using high-end GPUs won't help. Higher resolution images will look different anyway.
bmaltais notifications@github.com kirjoitti 15.6.2016 kello 15.33:
I am wondering is a similar approach for content might also help when combined with style weight balancing. It would also be possible to favor content layers with high loss (or posibly low loss in the case of content) using autotuned content weight... Might be worth testing.
It should be doable in a similar fashion.
I didn’t originally implement layer specific weight for content as I most often use only one layer, but others have different approaches.
In my case it won't be of much help since I tend to use a content weight of 0 along with an image init instead of noise... but for noise init it might make a difference.
I was surprised to see content_weight 0 earlier but I get it… there are indeed many approaches and they can lead to different ways of using weights.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
mystcat notifications@github.com kirjoitti 15.6.2016 kello 17.45:
Hannu,
I just got a little insight. It should be obvious for somebody but anyway here it is - neurons "like" features of source image only at particular scale. The scale they saw it when model was trained. For example, neuron 62 on layer 5_1 definitely reacts to eyes and mouth but it doesn't react that way when I feed the same image at smaller scale.
Yes, I guess this is more or less the same effect that changes the neural-style results when image size is increased.
I would have thought though that VGG19 would have been trained using 224px images, while what you show would indicate larger training images.
Concerning facial features, this looks interesting https://www.technologyreview.com/s/601684/machine-vision-algorithm-learns-to-transform-hand-drawn-sketches-into-photorealistic-images/?utm_campaign=socialflow&utm_source=facebook&utm_medium=post . I have also been thinking that models which have actually been trained to generate images (such as in this one) might work better than those which have been trained for classification. Maybe.
Have to stop now, and might not be able to respond before Sunday or Monday.
@htoyryla Here is the code for the content weight. It is ratio based to progressively apply as a full on at 1st resulted in bad images:
local function feval(x)
num_calls = num_calls + 1
net:forward(x)
local grad = net:updateGradInput(x, dy)
local loss = 0
local ratio = num_calls / params.num_iterations
if ((num_calls > 1) and (params.content_weight > 0)) then
local maxcloss = 0
local mincloss = 100000000
for _, mod in ipairs(content_losses) do
loss = loss + mod.loss
if (maxcloss < mod.loss) then
maxcloss = mod.loss
end
if (mincloss > mod.loss) then
mincloss = mod.loss
end
end
for i, mod in ipairs(content_losses) do
mod.strength = params.content_weight * (1 - ((1-(mincloss / mod.loss)) * ratio))
end
end
local slosses = torch.Tensor(#style_losses)
for i, mod in ipairs(style_losses) do
loss = loss + mod.loss
slosses[i] = mod.loss
end
local maxloss = torch.max(slosses)
for i, mod in ipairs(style_losses) do
mod.strength = params.style_weight * mod.loss / maxloss
end
maybe_print(num_calls, loss)
maybe_save(num_calls)
OK. Forget about ratio based. I played more with this and I think I nailed how to make use of the content/style balancing. Here is the "final" code:
-- Balancing option. Acceptable values are between 0..1 0 means no balancing and 1 mean full balancing
cmd:option('-style_max_bal', 0)
cmd:option('-style_min_bal', 0)
cmd:option('-content_max_bal', 0)
cmd:option('-content_min_bal', 0)
local function feval(x)
num_calls = num_calls + 1
net:forward(x)
local grad = net:updateGradInput(x, dy)
local loss = 0
local ratio = num_calls / params.num_iterations
if ((num_calls > 1) and (params.content_weight > 0)) then
local maxcloss = 0
local mincloss = 100000000
for _, mod in ipairs(content_losses) do
loss = loss + mod.loss
if (maxcloss < mod.loss) then
maxcloss = mod.loss
end
if (mincloss > mod.loss) then
mincloss = mod.loss
end
end
local cminrate = params.content_min_bal
local cmaxrate = params.content_max_bal
for i, mod in ipairs(content_losses) do
mod.strength = (mod.strength * (1 - cminrate)) + (params.content_weight * mincloss / mod.loss * cminrate)
mod.strength = (mod.strength * (1 - cmaxrate)) + (params.content_weight * mod.loss / maxcloss * cmaxrate)
end
end
local slosses = torch.Tensor(#style_losses)
for i, mod in ipairs(style_losses) do
loss = loss + mod.loss
slosses[i] = mod.loss
end
local maxloss = torch.max(slosses)
local minloss = torch.min(slosses)
local sminrate = params.style_min_bal
local smaxrate = params.style_max_bal
for i, mod in ipairs(style_losses) do
mod.strength = (mod.strength * (1 - sminrate)) + (params.style_weight * minloss / mod.loss * sminrate)
mod.strength = (mod.strength * (1 - smaxrate)) + (params.style_weight * mod.loss / maxloss * smaxrate)
end
maybe_print(num_calls, loss)
maybe_save(num_calls)
You essentially use options to apply balancing (min/max) to content/style layers. Default is 0 (no balancing) and can increase to 1 (100%) balanced. I have had interesting result playing with them one at a time... but possibly combining all 4 with different ratio might bring out different final results.
I noticed that applying a 100% balancing might result in really poor result... but a slightly less strength (like 0.95) might produce stunning results. Just be aware that using this full blast might not always be good.
A good place to start is to use -style_min_bal 0.75. It gives some good result when using -content_weight 0 -style_weight 100000
Have fun exploring.
Interesting thing about that neuron - it does detect eyes and not necessarily human's.
It could be one eye as well:
I have for a long time been interested in the idea of adjusting the weights of style layers separately. For instance if I see that some layer is having far larger or smalled than typical losses, it would be interesting to be able to adjust the weights.
I have now implemented this in https://github.com/htoyryla/neural-style by adding a new parameter -style_layer_weights, which accepts a comma-separated list of multipliers. The weight of each style layer will then be multiplier * style_weight. If the parameter is omitted, each layer will have multiplier = 1.
I assume that this is not for the average user, and that's why I didn't make this a pull request. But there might be other experiment-oriented users around, so I am letting them know that this exists.