jcjohnson / neural-style

Torch implementation of neural style algorithm
MIT License
18.31k stars 2.7k forks source link

lbfgs failing with "function value changing less than tolX" when in GPU mode #52

Open raptorecki opened 8 years ago

raptorecki commented 8 years ago

This code is some marvelous work! I'm stunned by the amazing results it can give.

I happened to encounter several small issues, maybe someone would be able to help me with.

I noticed some strange interruptions when rendering an image with lbfgs optimizer. It shows up like this:

(...) Iteration 740 / 1000 Content 1 loss: 2133260.000000 Style 1 loss: 33612.725830 Style 2 loss: 1111955.078125 Style 3 loss: 456756.542969 Style 4 loss: 19963398.437500 Style 5 loss: 875.515652 Total loss: 23699858.300076

function value changing less than tolX

The moment it happens is always completely random. I played with weights and other parameters but the funny thing is - it doesn't matter. I can render the same style and source with the same settings several times in a row and it will eventually fail like this. Or if it keeps failing it will eventually go through. When I render JPG sequence using simple bash loop (with the same style and settings) it usually fails once or twice every ten frames and moves on. I can then render failed frames again with the same settings and they finish fine.

Exploring this issue a bit I tried to modify local tolX value in torch/install/share/lua/5.1/optim/lbfgs.lua or even comment out the entire if abs(f-f_old) < tolX then break. But then if during rendering the loss value stops changing I end up with black frames.

(...) Iteration 990 / 1000 Content 1 loss: 4483204.062500 Style 1 loss: 30816.199951 Style 2 loss: 3080145.585938 Style 3 loss: 2145852.421875 Style 4 loss: 69358680.000000 Style 5 loss: 6758.217773 Total loss: 79105456.488037 Iteration 995 / 1000 Content 1 loss: 4483204.062500 Style 1 loss: 30816.199951 Style 2 loss: 3080145.585938 Style 3 loss: 2145852.421875 Style 4 loss: 69358680.000000 Style 5 loss: 6758.217773 Total loss: 79105456.488037 Iteration 1000 / 1000 Content 1 loss: 4483204.062500 Style 1 loss: 30816.199951 Style 2 loss: 3080145.585938 Style 3 loss: 2145852.421875 Style 4 loss: 69358680.000000 Style 5 loss: 6758.217773 Total loss: 79105456.488037

reached max number of iterations

What is worth mentioning, this happens only in GPU mode for both nn and cudnn backends. If only I could understand why is it happening I would love to investigate it a bit more.

I'm also curious about the memory limitations. I use for example -image_size 640 and according to nvidia-smi that uses 1221MiB/2046MiB so it seems there is plenty left. But when I proceed with -image_size 641 it drops with familiar cuda runtime error (2) : out of memory. Of course the values vary with different styles and sources. Idle state uses 45MiB/2046MiB. Could anyone explain what is preventing the library from using the remaining memory? In CPU mode I can use my entire 16G with no problem.

I'm rendering on GTX 770 (2G of GPU RAM) in GPU and i7 4790k (16G of RAM) in CPU using Ubuntu 14.04.2, Nvidia 352.39, CUDA 7.5.18-19867135 and CUDNN 7.0.

Again, results of this code are just mind blowing. Thank you for sharing this, jcjohnson!

raptorecki commented 8 years ago

So... Despite the issues mentioned above I managed to push out some visuals for my music.

You can see the results here

First of all big thanks for @jcjohnson for this code and @hughperkins for contributions (-seed saved me a lot of trouble).

I would like to share some of my experiences doing those tests:

Some processing info:

Iteration 1 / 600 Content 1 loss: 12688274.218750 Style 1 loss: 1513260.009766 Style 2 loss: 261467156.250000 Style 3 loss: 76750781.250000 Style 4 loss: 2047350875.000000 Style 5 loss: 28623.922348 Total loss: 2399798970.650864

creating recyclable direction/step/history buffers Iteration 2 / 600 Content 1 loss: 12688274.218750 Style 1 loss: 1513260.009766 Style 2 loss: 261467156.250000 Style 3 loss: 76750781.250000 Style 4 loss: 2047350875.000000 Style 5 loss: 28623.922348 Total loss: 2399798970.650864 function value changing less than tolX - Single 1920px frame took about 58GB of RAM and 5 hours on 12 cores of Intel Xeon E5-2650 to process. I deemed it unfeasible to use for a video, but it was interesting to see how far I can push it. - On my home GTX 770 with 2GB of vRAM I could process frames of about 600-620px at a rate of 1 frame per 3 minutes (~20 frames/hour).

Processing and post production thoughts:

Thanks guys, all the best, have fun!

hughperkins commented 8 years ago

What is worth mentioning, this happens only in GPU mode for both nn and cudnn backends.

Would be interesting to know whether it's present also for clnn. If it is, then it points to something in the code-base, and if not then it could be something in the driver. There's nothing particularly about GPUs that should mean the numbers are radically different than CPU, other than, GPUs are using 32-bit floats.

Hmmm... did you try CPU, with 32-bit floats? ie ,cast everything to a :float()?