jcjohnson / neural-style

Torch implementation of neural style algorithm
MIT License
18.31k stars 2.7k forks source link

Modifying neural_style.lua's verbose output format into a more CSV friendly format? #370

Open ProGamerGov opened 7 years ago

ProGamerGov commented 7 years ago

I am attempting to modify this function into a more CSV friendly format for graphing:

local function maybe_print(t, loss)
    local verbose = (params.print_iter > 0 and t % params.print_iter == 0)
    if verbose then
      print(string.format('Iteration %d / %d', t, params.num_iterations))
      for i, loss_module in ipairs(content_losses) do
        print(string.format('  Content %d loss: %f', i, loss_module.loss))
      end
      for i, loss_module in ipairs(style_losses) do
        print(string.format('  Style %d loss: %f', i, loss_module.loss))
      end
      print(string.format('  Total loss: %f', loss))
    end
  end

The function by default creates this output:

Iteration 50 / 100
  Content 1 loss: 5779007.421875
  Style 1 loss: 16.728337
  Style 2 loss: 3375.864029
  Style 3 loss: 35580.190659
  Style 4 loss: 7511.037111
  Style 5 loss: 231.612459
  Total loss: 5825722.854470
Iteration 100 / 100
  Content 1 loss: 5779007.421875
  Style 1 loss: 16.728337
  Style 2 loss: 3375.864029
  Style 3 loss: 35580.190659
  Style 4 loss: 7511.037111
  Style 5 loss: 231.612459
  Total loss: 5825722.854470
Iteration 150 / 100
  Content 1 loss: 5779007.421875
  Style 1 loss: 16.728337
  Style 2 loss: 3375.864029
  Style 3 loss: 35580.190659
  Style 4 loss: 7511.037111
  Style 5 loss: 231.612459
  Total loss: 5825722.854470

The original output format is good for following the process visually, but it does not well for any CSV related tasks like graphing.

I am trying to make the terminal output look like this (Markdown table format does not need to be used in the actual code, I'm just using it here to show the look of the format I am aiming for):

Iteration Content 1 loss Style 1 loss Style 2 loss Style 3 loss Style 4 loss Style 5 loss Total loss
50 5779007.421875 16.728337 3375.864029 35580.190659 7511.037111 231.612459 5825722.854470
100 5779007.421875 16.728337 3375.864029 35580.190659 7511.037111 231.612459 5825722.854470
150 5779007.421875 16.728337 3375.864029 35580.190659 7511.037111 231.612459 5825722.854470
200 5779007.421875 16.728337 3375.864029 35580.190659 7511.037111 231.612459 5825722.854470

I have tried using various combinations of local variables, but I always get this error: Attempt to concatenate global 'variable' (a nil value). I don't really have much experience with Lua at the moment, but I understand that concatenating each of the variables into a single string and then printing the string, is the simplest method of changing the terminal output format?

In order to capture the terminal output to a text file, I add this code: 2>&1 | tee ~/neural-style/mylog.log to the very end of the Neural-Style command parameters I am using.

htoyryla commented 7 years ago

Looks like a scope problem... you are accessing a local variable which does not exist in the scope where you access it, then Lua is looking for a global with the same name, which does not exist either. The error then comes because you are concatenating into nothing.

For collecting data from multiple iterations you need a variable declared in such a place that it will "live" throughout the iterations.

ProGamerGov commented 7 years ago

My code was looking like this:

          local a = string.format('Iteration %d / %d', t, params.num_iterations)
      local b = string.format('  Content %d loss: %f', i, loss_module.loss)
      local c = string.format('  Style %d loss: %f', i, loss_module.loss)
      local d = string.format('  Total loss: %f', loss)

          local e = a .. b  -- "concatenate"
          local f = e .. c  
          local g = f .. d 
          -- Somehow print "g" to the terminal.

Then I was going to print "g" to the terminal.

I was following the information I found here: https://stackoverflow.com/questions/21782240/how-to-concatenate-strings-into-one-using-loop

I tried putting the code I wrote above, into the function like this with the intent to remove the old "print(string.format" lines after I got my code working:

local function maybe_print(t, loss)
    local verbose = (params.print_iter > 0 and t % params.print_iter == 0)
    if verbose then
      print(string.format('Iteration %d / %d', t, params.num_iterations))
      local a = string.format('Iteration %d / %d', t, params.num_iterations)
      for i, loss_module in ipairs(content_losses) do
        print(string.format('  Content %d loss: %f', i, loss_module.loss))
        local b = string.format('  Content %d loss: %f', i, loss_module.loss)
      end
      for i, loss_module in ipairs(style_losses) do
        print(string.format('  Style %d loss: %f', i, loss_module.loss))
        local c = string.format('  Style %d loss: %f', i, loss_module.loss)
      end
      print(string.format('  Total loss: %f', loss))
      local d = string.format('  Total loss: %f', loss)
       local e = a .. b  -- "concatenate"
       local f = e .. c  
       local g = f .. d  
       -- Somehow print "g" to the terminal.
    end
  end

For collecting data from multiple iterations you need a variable declared in such a place that it will "live" throughout the iterations.

I was/am planning to just output the data to terminal so that I can easily capture it using tee. Then a conversion to CSV would be easy to do, like what I did with Caffe's training outputs before I realized there was a built in logging script.

htoyryla commented 7 years ago

It is not a problem with concatenation but scope. You are defining local variables inside a if-then block, which means that outside this if-then block, the variable ceases to exist. Same thing with for-do block.

See here https://www.lua.org/pil/4.2.html

To solve this, you have to declare the variables at the beginning of the function. That way they will survive throughout the function (and cease to exist when the when the function exits).

local function maybe_print(t, loss)
    local a = ""
    local b = ""
    local c = ""
    local verbose = (params.print_iter > 0 and t % params.print_iter == 0)
    if verbose then
      print(string.format('Iteration %d / %d', t, params.num_iterations))
      a = string.format('Iteration %d / %d', t, params.num_iterations)
      for i, loss_module in ipairs(content_losses) do
        print(string.format('  Content %d loss: %f', i, loss_module.loss))
        b = string.format('  Content %d loss: %f', i, loss_module.loss)
      end
      for i, loss_module in ipairs(style_losses) do
        print(string.format('  Style %d loss: %f', i, loss_module.loss))
        c = string.format('  Style %d loss: %f', i, loss_module.loss)
      end
      print(string.format('  Total loss: %f', loss))
      local d = string.format('  Total loss: %f', loss)
       local e = a .. b  -- "concatenate"
       local f = e .. c  
       local g = f .. d  
       -- Somehow print "g" to the terminal.
    end
  end

Then, instead of using so many variables, you could do

      a = string.format('Iteration %d / %d', t, params.num_iterations)
      for i, loss_module in ipairs(content_losses) do
        print(string.format('  Content %d loss: %f', i, loss_module.loss))
        a = a .. string.format('  Content %d loss: %f', i, loss_module.loss)

Actually, single variable is usually enough, as one can also concatenate newlines "\n" etc into it.

ProGamerGov commented 7 years ago

@htoyryla, so this code appears to work without error:

local function maybe_print(t, loss)
    local a = ""
    local b = ""
    local c = ""
    local verbose = (params.print_iter > 0 and t % params.print_iter == 0)
    if verbose then
        a = string.format('Iteration %d / %d', t, params.num_iterations)
      for i, loss_module in ipairs(content_losses) do
        a = a .. string.format('  Content %d loss: %f', i, loss_module.loss)
      end
      for i, loss_module in ipairs(style_losses) do
        a = a .. string.format('  Style %d loss: %f', i, loss_module.loss)
      end
    a = a .. string.format('  Total loss: %f', loss)
    print(a)
       -- Somehow print "g" to the terminal.
      end
  end

Which results in:

Iteration 50 / 1000 Content 1 loss: 7515056.250000 Style 1 loss: 2958737.54882 8 Style 2 loss: 59383289.062500 Style 3 loss: 36834720.703125 Style 4 loss: 7 79240.936279 Style 5 loss: 19465.633392 Total loss: 107490510.134125

Now I remove the text:

local function maybe_print(t, loss)
    local a = ""
    local b = ""
    local c = ""
    local verbose = (params.print_iter > 0 and t % params.print_iter == 0)
    if verbose then
        a = string.format('%d ', t, params.num_iterations)
      for i, loss_module in ipairs(content_losses) do
        a = a .. string.format(' %f', i, loss_module.loss)
      end
      for i, loss_module in ipairs(style_losses) do
        a = a .. string.format(' %f', i, loss_module.loss)
      end
    a = a .. string.format(' %f', loss)
    print(a)
      end
  end

And that results in:

50  1.000000 1.000000 2.000000 3.000000 4.000000 5.000000 107547864.963627
100  1.000000 1.000000 2.000000 3.000000 4.000000 5.000000 19892982.002258
150  1.000000 1.000000 2.000000 3.000000 4.000000 5.000000 14015005.483246
200  1.000000 1.000000 2.000000 3.000000 4.000000 5.000000 11786042.230940
250  1.000000 1.000000 2.000000 3.000000 4.000000 5.000000 10424066.450262
300  1.000000 1.000000 2.000000 3.000000 4.000000 5.000000 9689034.508419
350  1.000000 1.000000 2.000000 3.000000 4.000000 5.000000 9298543.469238
400  1.000000 1.000000 2.000000 3.000000 4.000000 5.000000 9169725.155449

The style/content layer value is overriding the associated loss values?

ProGamerGov commented 7 years ago

It seems that I need to keep the %d along with the %f for the style and content loss values. Which is fine as I can either use or omit those rows in a CSV file:

  local function maybe_print(t, loss)
    local a = ""
    local b = ""
    local c = ""
    local verbose = (params.print_iter > 0 and t % params.print_iter == 0)
    if verbose then
        a = string.format('%d ', t, params.num_iterations)
      for i, loss_module in ipairs(content_losses) do
        a = a .. string.format(' %d %f', i, loss_module.loss)
      end
      for i, loss_module in ipairs(style_losses) do
        a = a .. string.format(' %d %f', i, loss_module.loss)
      end
    a = a .. string.format(' %f', loss)
    print(a)
      end
   end

The above code results in this output:

50  1 7522738.281250 1 2960488.037109 2 59444056.640625 3 36916854.492188 4 782248.077393 5 19509.374142 107645894.902706   
100  1 8126899.218750 1 1157280.029297 2 7763930.419922 3 2619203.796387 4 253291.763306 5 11579.954624 19932185.182285 
150  1 8272956.250000 1 700535.797119 2 3874417.236328 3 892234.497070 4 172101.814270 5 9400.653362 13921646.248150    
200  1 8359832.031250 1 424540.374756 2 2105716.552734 3 550890.014648 4 138426.864624 5 8412.817240 11587818.655252    
250  1 8372617.187500 1 247307.235718 2 1146672.546387 3 381429.405212 4 118721.660614 5 7760.557652 10274508.593082    

The modified neural_style.lua can be found here: https://gist.github.com/ProGamerGov/a8134605c89f01e5bcd88539456675b8

Though maybe I should have it automatically add the column names at the top of each column?

Edit:

I made a simple graph of a test output with Excel, but I plan to throw together a simple Python graphing script:

htoyryla commented 7 years ago

The problem here

      for i, loss_module in ipairs(content_losses) do
        a = a .. string.format(' %f', i, loss_module.loss)
      end
      for i, loss_module in ipairs(style_losses) do
        a = a .. string.format(' %f', i, loss_module.loss)
      end

is that you specify format for one number only, so i gets printed instead of loss. Note that string format gives first a template for formatting the output and then the variables to be filled into that template. In your code above, the template asks for only one float, but you give both i and loss, and there only i, not loss is printed.

Your solution to insert %d prints out both i and loss, but I wonder if you really need to print the numbers from 1 to 5 on every line.

If not, remove both %d and i as follows.

      for i, loss_module in ipairs(content_losses) do
        a = a .. string.format(' %f',  loss_module.loss)
      end
      for i, loss_module in ipairs(style_losses) do
        a = a .. string.format(' %f', loss_module.loss)
      end
ProGamerGov commented 7 years ago

Thanks for the help, the suggested edits resolve the issue of printing the unneeded numbers:

local function maybe_print(t, loss)
    local a = ""
    local b = ""
    local c = ""
    local verbose = (params.print_iter > 0 and t % params.print_iter == 0)
    if verbose then
        a = string.format('%d ', t, params.num_iterations)
      for i, loss_module in ipairs(content_losses) do
        a = a .. string.format(' %f',  loss_module.loss)
      end
      for i, loss_module in ipairs(style_losses) do
        a = a .. string.format(' %f', loss_module.loss)
      end
    a = a .. string.format(' %f', loss)
    print(a)
      end
   end

Results in:

50  7527201.562500 2959383.544922 59384783.203125 36886895.507812 783812.164307 19818.699360 107561894.682026   
100  8069984.375000 1155963.317871 7789196.777344 2680919.128418 272456.726074 12078.706741 19980599.031448 
150  8239482.031250 701030.090332 3884134.643555 912880.187988 183367.126465 10329.926491 13931224.006081   
200  8380553.906250 426734.527588 2126519.531250 550033.218384 140874.286652 8890.881300 11633606.351423    
250  8417350.000000 248732.254028 1159272.583008 404753.723145 121263.404846 7739.640713 10359111.605740
htoyryla commented 7 years ago

Good. You could now also remove the redundant lines

    local b = ""
    local c = ""

as you no longer use those variables.

htoyryla commented 7 years ago

Have you seen texture-net, it uses torch.display to show both intermediate images and a loss graph on a browser. That would be one option (real time!).

ProGamerGov commented 7 years ago

I was not aware that DmitryUlyanov's Texture-net? graphed the loss values in real time. I'll have to look into that. Though I have had trouble using the browser in browser supported implementations with AWS.

htoyryla commented 7 years ago

It shows a single (total?) loss together with images. I have added torch.display to fast-neural-style also, although images only, no graph so far. I like torch.display with the browser as it allows me to monitor training remotely from a laptop or ipad instead of sitting by the Linux box (requires running torch-display so that it accepts remote clients).

ProGamerGov commented 7 years ago

@htoyryla

It seems that content loss always rises? And the graph gets more jagged in the later steps with each subsequent multires step.

ProGamerGov commented 7 years ago

So it looks like -normalize_gradients actually causes the content loss value to increase over time. -optimizer adam and -optimizer lbfgs create distinct loss graph patterns. Changing the -style_weight and -content_weight all affect the graph as you would expect.

Big 4x4 Graph Comparison Chart: https://i.imgur.com/absiOwx.png

The parameters from the graphs can be found here.

ProGamerGov commented 7 years ago

If Texture-net only shows the total loss value, than you miss out on the different loss amounts between the content and style image, as well as between different layers.

htoyryla commented 7 years ago

"If Texture-net only shows the total loss value, than you miss out on the different loss amounts between the content and style image, as well as between different layers."

Texture-net is like fast-neural-style, it trains a feed-forward network using a VGG model to measure and minimize losses for style transfer. I guess one could, if one really wants, add more graphs to show the losses from the different layers inside the VGG model for both content and style. I am not at all sure if that would be helpful. Usually we are training with a set of content images, so the losses are either from different images or should we perhaps calculate an average. Frankly, I don't see a point.

For me, it was a fresh experience to work with texture-nets, thinking more about which images to use in the training and how to scale them, then running training for a while looking at how the images develop, interrupting if it does not look good.

It could be that just because neural-style outputs these losses, the losses have received so much attention. People look at them and interpret them, but do they really understand what the losses are and how they work?

ProGamerGov commented 7 years ago

I am not at all sure if that would be helpful. Usually we are training with a set of content images, so the losses are either from different images or should we perhaps calculate an average. Frankly, I don't see a point.

I think it's helpful for seeing how each parameters affects each loss value, and thus the resulting effect on the total loss value.

People look at them and interpret them, but do they really understand what the losses are and how they work?

Looking around, I haven't been able to find anyone really analyzing the losses and trying to figure out what they mean. I have just seen the really obvious stuff.


So looking back to my early Neural-Style days, I noticed at the time that there appeared to be a cycle of the output image moving between "noisy" and "clean" as the iterations went on. I now suspect that cycle could have been shown as spikes on a loss graph.

Looking into these graph spikes:

Direct link to graph: https://i.imgur.com/PPuunSp.png

Iteration 1420-1450: https://i.imgur.com/TidQDrZ.gifv

Iteration 1390-1420: https://i.imgur.com/bzf1D9g.gifv

The changes that the image undergoes during the graph spike is interesting. These graph spikes don't seem to correspond to anything I can think of, but I do know that the graphs of more aesthetically pleasing outputs seem to have more "smooth" loss graphs with far fewer and even no spikes at all.

Neural-Style uses your parameters to try and find the most efficient loss function. So I think that the spikes on the loss graph indicate inefficiency in the loss function.

This seems to be in-line with how some of the graphs are basically a single giant spike like this:

Direct link to graph: https://i.imgur.com/sVWAZEY.png


htoyryla commented 7 years ago

What you are looking at is the behaviour of the optimizer (l-bfgs or adam) trying to look for a minimum of total loss in a mindbogglingly multidimensional space. Each possible image corresponds to one point in that space. The space has a terrain, with hills and valleys corresponding to the total loss at each point. When researchers speak of loss function, they mean loss as a function in this multidimensional space.

Neural-style inputs an image into the model, and measures the losses from the selected layers, then the optimizer changes the image a bit in an attempt to decrease losses. Your curves show how the optimizer is performing.

That the spikes correspond to poor images is simply natural, a high total loss peak by definition means the image is far different from what we expect it to be. That a spike occurs means that the optimizer has missed and gone astray, but that it then recovers is a good sign.

If you would use -init random, the search would always start from a different point in the multidimensional space and the loss curves would look different, as the loss curve corresponds to the terrain crossed on the path chosen by the optimizer.

Using -init image is different as one always starts from the same point (corresponding to the content image). Different from -init random, where both content and style losses start from a high value, we now have content and style as if competing. The style losses can only decrease if we go away from the content image, so that the content losses will necessarily rise. But anyway, the optimizer will still be looking for a minimum for total losses.

When you change the layer settings, then, the whole playground changes, meaning that the minima and maxima will be places differently in the multidimensional loss terrain.

htoyryla commented 7 years ago

Me: I am not at all sure if that would be helpful. Usually we are training with a set of content images, so the losses are either from different images or should we perhaps calculate an average. Frankly, I don't see a point.

You: I think it's helpful for seeing how each parameters affects each loss value, and thus the resulting effect on the total loss value.

I was commenting specifically concerning texture_nets and other feed-forward solutions like fast-neural-style. If you believe that examining how losses behave during optimization helps you to better select parameters, then you should be able to transfer your knowhow to a feed-forward solution which is also using a VGG model to measure losses. Meaning, if you now know which layer combinations work like you want, just go on and use them.

Which still leaves everything I wrote in my previous comment. In my view, losses are simply a tool to find an optimum, not an absolute quality measure. When we change parameters and images, the whole terrain changes together with the starting point and the optimum. The losses observed simply give the height of the terrain as the optimizer moves across the terrain in search of the lowest point.