OpenCL support - Githubissues

napsternxg commented 9 years ago

I tried implementing OpenCL support and the code is at: https://github.com/napsternxg/neural-style/tree/opencl

However I get the following error when running the code:

$ $ th neural_style_opencl.lua -style_image examples/inputs/picasso_selfport1907.jpg -content_image examples/inputs/brad_pitt.jpg -gpu 0 -backend 'clnn' -output_image profile.png
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:505] Reading dangerously large protocol message.  If the message turns out to be larger than 1073741824 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 574671192
Successfully loaded models/VGG_ILSVRC_19_layers.caffemodel
conv1_1: 64 3 3 3
conv1_2: 64 64 3 3
conv2_1: 128 64 3 3
conv2_2: 128 128 3 3
conv3_1: 256 128 3 3
conv3_2: 256 256 3 3
conv3_3: 256 256 3 3
conv3_4: 256 256 3 3
conv4_1: 512 256 3 3
conv4_2: 512 512 3 3
conv4_3: 512 512 3 3
conv4_4: 512 512 3 3
conv5_1: 512 512 3 3
conv5_2: 512 512 3 3
conv5_3: 512 512 3 3
conv5_4: 512 512 3 3
fc6: 1 1 25088 4096
fc7: 1 1 4096 4096
fc8: 1 1 4096 1000
Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
Using OpenCL device: Turks
/home/torch/install/bin/luajit: C++ exception

I believe the issue is because of the SpatialConvolutionMM which is implemented in ccn2 module.

napsternxg commented 9 years ago

Here is the output:

$ th neural_style_opencl.lua -style_image examples/inputs/picasso_selfport1907.jpg -content_image examples/inputs/brad_pitt.jpg -gpu 0 -backend 'clnn' -output_image profile.png -image_size 25 -model_file models/vgg_normalised.caffemodel -optimizer adam
In Function main
Starting load model
In loadcaffe_load
Successfully loaded models/vgg_normalised.caffemodel
Finished proto to lua   
conv1_1: 64 3 3 3
conv1_2: 64 64 3 3
conv2_1: 128 64 3 3
conv2_2: 128 128 3 3
conv3_1: 256 128 3 3
conv3_2: 256 256 3 3
conv3_3: 256 256 3 3
conv3_4: 256 256 3 3
conv4_1: 512 256 3 3
conv4_2: 512 512 3 3
conv4_3: 512 512 3 3
conv4_4: 512 512 3 3
conv5_1: 512 512 3 3
conv5_2: 512 512 3 3
conv5_3: 512 512 3 3
conv5_4: 512 512 3 3
Finished iterations     clnn
Finished network setup  
Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
Using OpenCL device: Turks
Finished content Image preprocess
Finished style Image preprocess 
Finished caffe variables
Starting network setup  
input:size()
  3
 25
 19
[torch.LongStorage of size 3]

currentOutput:size()
  3
 25
 19
[torch.LongStorage of size 3]

self.modules[   1       ]=      nn.TVLoss
currentOutput:size()
  3
 25
 19
[torch.LongStorage of size 3]

self.modules[   2       ]=      nn.SpatialConvolutionMM(3 -> 64, 3x3, 1,1, 1,1)
Apply_1t_1s_0pt_-2_*out = val1 build log: 
"/tmp/OCL19013T5.cl", line 53: warning: variable "thisLinearId" was declared
          but never referenced
        int thisLinearId;
            ^

currentOutput:size()
 64
 25
 19
[torch.LongStorage of size 3]

self.modules[   3       ]=      nn.ReLU
Apply_1t_0s_0pt_-2_*out = (*out > 0) ? *out : 0 build log: 
"/tmp/OCL19013T19.cl", line 49: warning: variable "thisLinearId" was declared
          but never referenced
        int thisLinearId;
            ^

input:size()
 64
 25
 19
[torch.LongStorage of size 3]

currentOutput:size()
 64
 25
 19
[torch.LongStorage of size 3]

self.modules[   1       ]=      nn.View
currentOutput:size()
  64
 475
[torch.LongStorage of size 2]

self.modules[   2       ]=      nn.ConcatTable {
  input
    |`-> (1): nn.Idusername
    |`-> (2): nn.Idusername
     ... -> output
}
/home/username/Downloads/torch/install/bin/luajit: .../Downloads/torch/install/share/lua/5.1/nn/Sequential.lua:45: attempt to call method 'size' (a nil value)
stack traceback:
        .../Downloads/torch/install/share/lua/5.1/nn/Sequential.lua:45: in function 'forward'
        neural_style_opencl.lua:150: in function 'main'
        neural_style_opencl.lua:424: in main chunk
        [C]: in function 'dofile'
        ...oads/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
        [C]: at 0x00406670

hughperkins commented 9 years ago

(I'm trying to install neural-style by the way, hence the pause in my replies :-P )

hughperkins commented 9 years ago

Cool. I can replicate the problem on my machine:

self.modules[   2   ]=  nn.ConcatTable {
  input
    |`-> (1): nn.Identity
    |`-> (2): nn.Identity
     ... -> output
}
/home/user/torch/install/bin/luajit: /home/user/torch/install/share/lua/5.1/nn/Sequential.lua:45: attempt to call method 'size' (a nil value)
stack traceback:

hughperkins commented 9 years ago

Ah, might be missing nn.MM module :-P

(Edit: seems like ConcatTable works ok:

a = nn.ConcatTable()
a:add(nn.Linear(3,2))
a:add(nn.Linear(3,2))
A = torch.Tensor(3):uniform()
-- a:forward(A)[1]
-- -0.2971
--  0.1385
-- [torch.DoubleTensor of size 2]
a:forward(A)[2]
-- similar output

acl = a:clone():cl()
torch.type(acl.modules[1].weight)
-- torch.ClTensor
Acl = A:cl()
acl:forward(Acl)[1]
-- -0.2971
--  0.1385
-- [torch.ClTensor of size 2]

) (Edit 2: and GramMatrix seems to work ok actually. First revert the Sequential.lua changes, then do:

th
require 'nn'
function GramMatrix()
  local net = nn.Sequential()
  net:add(nn.View(-1):setNumInputDims(2))
  local concat = nn.ConcatTable()
  concat:add(nn.Identity())
  concat:add(nn.Identity())
  net:add(concat)
  net:add(nn.MM(false, true))
  return net
end
g = GramMatrix()
g:forward(torch.Tensor(3,2,4):uniform())
-- works ok

require 'clnn'
gcl = g:clone():cl()
gcl:forward(torch.ClTensor(3,2,4):uniform())
-- works ok

)

napsternxg commented 9 years ago

Am I missing the nn.MM module ?

hughperkins commented 9 years ago

Ok, the problem is right at the start of the network. Basically, if you put the following at line 261, you can see the network:

print('net', net)

Then, there are lots of layers, but first two are:

  (1): nn.TVLoss
  (2): nn.SpatialConvolutionMM(3 -> 64, 3x3, 1,1, 1,1)

Now, if you hack the Sequential.lua file with the following:

function Sequential:updateOutput(input)
   if input == nil then
     print('input nil')
   else
     if input.size ~= nil then
       print('input:size()', input:size())
     else
       print('input.size nil')
     end
   end
   local currentOutput = input
   for i=1,#self.modules do
      print('self.modules[', i , ']=', self.modules[i])
      if currentOutput == nil then
        print('currentoutput nil')
      else
        if currentOutput.size ~= nil then
          print('currentOutput:size()', currentOutput:size())
        else
          print('currentoutput.size is nil')
        end
      end
      currentOutput = self.modules[i]:updateOutput(currentOutput)
   end
   self.output = currentOutput
   return currentOutput
end

... then you will get the following output:

self.modules[   1   ]=  nn.TVLoss
currentOutput:size()    
 1425
[torch.LongStorage of size 1]

self.modules[   2   ]=  nn.SpatialConvolutionMM(3 -> 64, 3x3, 1,1, 1,1)
currentOutput:size()    
 1425
[torch.LongStorage of size 1]

The output of TVLoss is a 1-dimensional tensor of length 1425, but SpatialConvolutionMM (at least the opencl version, for now....) expects a 3 or 4 dimensional vector. Now, in theory, we can add a Reshape layer into the network, line 108, add somehting like:

  net:add(nn.Reshape(3, 25, 25))

... but strangely 3 * 25 * 25 == 1875 != 1425, so thats a bit odd. I'm not sure why these lengths mismatch yet, but I'm pretty sure that the problem is with a tensor size/shape mismatch between the TVLoss output and the following SpatialConvolutionMM layer input.

(Edit: Hmmm, actually, not quite this, since just after printing the network, this runs ok:

Running optimization with ADAM  
input:size()    
  3
 25
 19
[torch.LongStorage of size 3]

self.modules[   1   ]=  nn.TVLoss
currentOutput:size()    
  3
 25
 19
[torch.LongStorage of size 3]

self.modules[   2   ]=  nn.SpatialConvolutionMM(3 -> 64, 3x3, 1,1, 1,1)
currentOutput:size()    
  3
 25
 19
[torch.LongStorage of size 3]

The crash comes later. Maybe the incoming image is too small, and then after a few poolings it is 1x1? Kind of a mystery :-P )

hughperkins commented 9 years ago

I reckon we should try with a smaller model first. Any suggestions on an appropriately really small model to try? Goal is not to get good image output, just to check it runs ok, and then can try a larger model later.

hughperkins commented 9 years ago

(eg maybe a mnist lenet-5 or someting like that perhaps?)

napsternxg commented 9 years ago

Try with the vgg_normalized.caffeemodel, that is the small model I tried working with. Don't know of any other smaller model.

hughperkins commented 9 years ago

vgg normalized is physically small(er), but it's still got 19 layers. lenet-5 has like 5 layers or so.

hughperkins commented 9 years ago

Dont think we need pretrained weights for now. Suffiicent just to leave the weights initialized with random numbers.

hughperkins commented 9 years ago

ok, I hacked lines 62 or so of the neural_style_opencl.lua script, to have a single max pooling layer:

  -- local cnn = loadcaffe_wrap.load(params.proto_file, params.model_file, params.backend):float()
  cnn = nn.Sequential()
  cnn:add(nn.SpatialMaxPooling(2,2,2,2,0,0))
  cnn:float()

with default backend, runs ok
with backend clnn, get similar error to above:

self.modules[   2   ]=  nn.SpatialMaxPooling(2,2,2,2)
currentOutput:size()    
 36
[torch.LongStorage of size 1]

/home/user/torch/install/bin/luajit: ...ser/torch/install/share/lua/5.1/nn/SpatialMaxPooling.lua:36: bad argument #2 to 'SpatialMaxPooling_updateOutput' (3D or 4D (batch) tensor expected)
stack traceback:

Running like this:

th neural_style_opencl.lua -style_image examples/inputs/picasso_selfport1907.jpg -content_image examples/inputs/brad_pitt.jpg -gpu 0 -backend 'clnn' -output_image profile.png -image_size 4 -model_file models/vgg_normalised.caffemodel -optimizer adam

Edit: full output in failed case is now relatively short, so easy to compare with non-cl output. cl output is: http://pastebin.com/XYximE1f (and for cpu, is http://pastebin.com/pPx23RBr )

(Edit 2: if you modify the feval function as follows:

  local function feval(x)
    print('feval x:size()', x:size())
    num_calls = num_calls + 1
    net:forward(x)
    local grad = net:backward(x, dy)
    local loss = 0
    for _, mod in ipairs(content_losses) do
      loss = loss + mod.loss
    end
    for _, mod in ipairs(style_losses) do
      loss = loss + mod.loss
    end
    maybe_print(num_calls, loss)
    maybe_save(num_calls)

    collectgarbage()
    -- optim.lbfgs expects a vector for gradients
    print('loss', loss)
    print('grad:size()', grad:size())
    return loss, grad:view(grad:nElement())
  end

... then you will notice that:

the first entry to this funciton, the x tensor is 3d
the second time is 1d
at the end of this function, the grad tensor is 3d, correctly
... but returned as 1d, using the :view() function
problem is somehow linked to this?

I was briefly concerned that the :view() function was broken in cl, but seems not to be:

a = torch.ClTensor(3,4,5):uniform()
a:view(3*4*5)
-- shows a 1d tensor
a
-- continues to show a 3d tensor, ie hasnt unintentionally modified the original a tensor

)

hughperkins commented 9 years ago

Ah, looks like :addcdiv in cltorch reshapes the tensor, but in torch and cutorch does not. In adam.lua, line 63:

x:addcdiv(-stepSize, state.m, state.denom)

... causes the tensor to suddenly change from 3d to 1d. I need to look into this.

hughperkins commented 9 years ago

Hmmm... but ... cutorch does the same thing actually:

require 'cutorch'
a = torch.CudaTensor(3,2,4):uniform()
b = torch.CudaTensor(3*2*4):uniform()
a:addcdiv(1,b,b)
a:size()
-- 1 dimension...

Edit: ah, most recent cutorch fixes this :-)

hughperkins commented 9 years ago

Ok. Please update to latest cltorch, ie luarocks install cltorch, and then retry. For me, the following command runs ok to completion:

th neural_style_opencl.lua -style_image examples/inputs/picasso_selfport1907.jpg -content_image examples/inputs/brad_pitt.jpg -gpu 0 -output_image profile.png -image_size 32 -model_file models/vgg_normalised.caffemodel -optimizer adam -num_iterations 3 -backend clnn

vkorablin commented 9 years ago

@hughperkins

Same here. :+1:

hughperkins commented 9 years ago

Cool :-)

hughperkins commented 9 years ago

Hi. I've updated cltorch to allow get/set on individual elements. So, lbfgs might work now. Please feel free to luarocks install cltorch, and see to what extent lbfgs works for you.

hughperkins commented 9 years ago

Hi Shubhanshu, I've added your OpenCL port to the clnn readme by the way :-) https://github.com/hughperkins/clnn#example-networks

napsternxg commented 9 years ago

@hughperkins thanks a lot. Yes this runs and generates images. Really appreciate adding my example on the link.

Although, I think because of the memory of my GPU, I can't generate any reasonable output even after using -image_size=150. This was the full command:

th neural_style_opencl.lua -style_image examples/inputs/picasso_selfport1907.jpg -content_image examples/inputs/brad_pitt.jpg -gpu 0 -output_image profile.png -image_size 150 -model_file models/vgg_normalised.caffemodel -backend clnn

But I am glad that it will run for someone with a better GPU.

Here are some of the images I got. profile profile_100 profile_200 profile_300 profile_400 profile_500 profile_600 profile_700 profile_800 profile_900

I believe using the larger model is the best bet. Maybe @jcjohnson can elaborate on this.

napsternxg commented 9 years ago

I am trying to run it with the nin_imagenet_conv model and I am getting the error about SpatialAveragePooling_updateOutput not implemented.

Here is the command:

$ th neural_style_opencl.lua -style_image examples/inputs/picasso_selfport1907.jpg -content_image examples/inputs/brad_pitt.jpg -gpu 0 -output_image profile.png -model_file models/nin_imagenet_conv.caffemodel -proto_file models/solver.prototxt -backend clnn

Here is the error:

/home/username/Downloads/torch/install/bin/luajit: ...torch/install/share/lua/5.1/nn/SpatialAveragePooling.lua:14: Not implemented at /tmp/luarocks_clnn-scm-1-2534/clnn/SpatialAveragePooling.cpp:59
stack traceback:
        [C]: in function 'SpatialAveragePooling_updateOutput'
        ...torch/install/share/lua/5.1/nn/SpatialAveragePooling.lua:14: in function 'updateOutput'
        .../Downloads/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
        neural_style_opencl.lua:149: in function 'main'
        neural_style_opencl.lua:424: in main chunk
        [C]: in function 'dofile'
        ...oads/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
        [C]: at 0x00406670

hughperkins commented 9 years ago

For the images, per the FAQ, I reckon that big blocks of continuous color means your tv is too high. @jcjohnson Is that right?

hughperkins commented 9 years ago

For the nin_imagenet_conv, can you state the parameters of the averagepooling layer? ie, what is the pool size, the input size, and the stride?

hughperkins commented 9 years ago

Hi Shubhanshu, for nin_imagenet_conv, I've updated clnn to handle a very specific averagepooling non-batched geometry. Can you luarocks install clnn, and retry please? If it still fails, then I need to know the exact geometry you are using, ie input size, pool size, and stride.

jcjohnson commented 9 years ago

@napsternxg @hughperkins It looks like it's working! When you use the normalized network the default values for content weight, style weight, and TV weight won't give good results; in particular you should reduce the TV weight by an order of magnitude or more.

jcjohnson commented 9 years ago

Also if you use a network other than VGG-19 or its normalized variety, you'll need to change the layers used for style and content reconstruction. At master you can select these with the -style_layers and -content_layers flags, but it looks like you forked before those were added; you'll instead want to change the indices of the style and content layers here https://github.com/napsternxg/neural-style/blob/opencl/neural_style_opencl.lua#L90

vkorablin commented 9 years ago

Could someone test this on proper hardware w/ the default settings?

I've been playing with the normalized model at 256px size (on a 7750 w/ just 1GB GPU RAM), and while it shows some interesting results, one problem I've found is that it transplants more than just textures onto the target image. For example, here's a somewhat creepy Picasso/Pitt hybrid:

-normalize_gradients -content_weight 50000 -style_weight 90000

(Command line: th neural_style_opencl.lua -style_image examples/inputs/picasso_selfport1907.jpg -content_image examples/inputs/brad_pitt.jpg -gpu 0 -output_image profile.png -image_size 256 -model_file models/vgg_normalised.caffemodel -backend clnn -num_iterations 1000 -save_iter 50 -normalize_gradients -content_weight 50000 -style_weight 90000)

Would be good to make sure it's the hyperparameter choice or the model as opposed to the port.

hughperkins commented 9 years ago

How long does this take to train approximately?

vkorablin commented 9 years ago

@hughperkins w/ the command line I've given it's maybe 10 minutes or so on my hardware. Broad features will become clear after ~200 iterations (2-3 minutes?).

hughperkins commented 9 years ago

Hmmm, I get an error -4, memory object allocation failure, just at end of second block of 50 iterations. I have a 1GB card too (GeForce 940M). If we can modify the commandline to use a little bit less memory, I should probably be able to run both CUDA and OpenCL on it.

vkorablin commented 9 years ago

The most straightforward thing is to lower image size. -image_size 200 perhaps?

hughperkins commented 9 years ago

Ok. Trying to brush the cobwebs off my cunn at the moment. Giving me some odd error about /home/user/torch/install/share/lua/5.1/cunn/init.lua:9: attempt to index field '_flattenTensorBuffer' (a nil value). Digging...

edit: hmmm, needs a new module called inn. Installing...

hughperkins commented 9 years ago

Ok, cunn runs now. Out of memory using cunn with imagesize 256. Trying 200...

hughperkins commented 9 years ago

after 750, using cunn: profile_750

(Edit: the commandline used: th neural_style.lua -style_image examples/inputs/picasso_selfport1907.jpg -content_image examples/inputs/brad_pitt.jpg -gpu 0 -output_image profile.png -image_size 200 -model_file models/vgg_normalised.caffemodel -num_iterations 1000 -save_iter 50 -normalize_gradients -content_weight 50000 -style_weight 90000)

vkorablin commented 9 years ago

Thank you! Looks quite similar indeed.

hughperkins commented 9 years ago

Does seem fairly convincing. We should probably fix the random seed, to be sure.

hughperkins commented 9 years ago

Hmmm, if I put torch.manualSeed(123) at line 12, then, each clnn run is identical to each other, and each cunn run is identical to each other, but the clnn and cunn outputs are slightly different. after 50 iterations:

cunn: profilecu2b_50

clnn: profilecl2b_50

hughperkins commented 9 years ago

Hmmm, and what is more, cpu gives same results as cuda. So I probably need to dig a bit. For 10 iterations:

cpu: profilecpuseed1861

cuda: profilecuseed1861

cl: profileclseed1861

Edit: hmmm, but -gpuid -1 with neural_style_opencl.lua gives similar results to gpuiid 0 with neural_style_opencl.lua, so might just be slightly different forks: profilecpuoclseed1861

Edit2: ok, looks like if the manualSeed is at line 186 or so, just after line -- initialize the image, then cpu, cuda, cl all almost agree, except that cl has a fairly blank margin down the right hand side. So I reckon one of the paddings in one of the layers has an issue somehow, somewhere. Will continue digging...

hughperkins commented 9 years ago

So, I've written the following script, to compare between cl, cuda, cpu: http://pastebin.com/jRGhyPij

by changing the numlayers value on line 66, can compare first numlayers layers, between cuda and cpu, and between cl and cpu
for the first 12 layers, cl and cuda completely agree, up to the display precision:

sumabsdiffcl    0.00019182558025932
maxabsdiffcl    4.6193599700928e-07
sumabsdiffcu    0.00019182558025932
maxabsdiffcu    4.6193599700928e-07

after 13 layers, they are different:

sumabsdiffcl    0.0003578155010473
maxabsdiffcl    3.5762786865234e-07
sumabsdiffcu    0.00027025085728383
maxabsdiffcu    2.9802322387695e-07

... so I probably need to check what is happening on layer 13 (which is: (13): nn.SpatialConvolutionMM(256 -> 256, 3x3, 1,1, 1,1))

Edit2: fairly sure the vgg forwards/backwards is correct. Using this script: http://pastebin.com/1d73iQWK It does full forwards/backwards pass through vgg, for imagesize of 128. It does this 3 times: for cpu, for cl, for cuda. Then it compares the results, normalizes to 0-1 range, and saves to pngs. The results for cl-vs-cu, cl-vs-cpu, cu-vs-cpu are below. The cl and cuda vs cpu plots are comparable. There is no artifact down the right hand margin.

cl vs cu: clcudiff

cl vs cpu: clcpudiff

cu vs cpu: cucpudiff

napsternxg commented 9 years ago

@hughperkins I updated clnn and ran using the nin model. I am getting error in the SpatialAveragePooling_updateOutput function

allocate workbuffer
/home/username/Downloads/torch/install/bin/luajit: ...torch/install/share/lua/5.1/nn/SpatialAveragePooling.lua:14: bad argument #2 to 'SpatialAveragePooling_updateOutput' (input image smaller than kernel size)
stack traceback:
        [C]: in function 'SpatialAveragePooling_updateOutput'
        ...torch/install/share/lua/5.1/nn/SpatialAveragePooling.lua:14: in function 'updateOutput'
        .../Downloads/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
        neural_style_opencl.lua:143: in function 'main'
        neural_style_opencl.lua:418: in main chunk
        [C]: in function 'dofile'
        ...oads/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
        [C]: at 0x00406670

hughperkins commented 9 years ago

Hi Shubhanshu, the error 'input image smaller than kernel size' normally means the image size is too small. Normally it starts large, but the multiple poolings reduces it each time. Can you try a larger image size please? (By the way, can you paste appropriate wget commands, or similar, so I can try the nin model too please? I downloaded some kind of nin model, but it doesnt have the '_conv' suffix, so not sure if is the same one?)

napsternxg commented 9 years ago

@hughperkins I increased the image size to the default 512. Now I get another error. This is possibly some issue in nn module.

$ th neural_style_opencl.lua -style_image examples/inputs/picasso_selfport1907.jpg -content_image examples/inputs/brad_pitt.jpg -gpu 0 -output_image profile.png -model_file models/nin_imagenet_conv.caffemodel -proto_file models/solver.prototxt -backend clnn -num_iterations 3

/home/username/Downloads/torch/install/bin/luajit: ...ity/Downloads/torch/install/share/lua/5.1/nn/SoftMax.lua:4: attempt to call field 'SoftMax_updateOutput' (a nil value)
stack traceback:
        ...ity/Downloads/torch/install/share/lua/5.1/nn/SoftMax.lua:4: in function 'updateOutput'
        .../Downloads/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
        neural_style_opencl.lua:143: in function 'main'
        neural_style_opencl.lua:418: in main chunk
        [C]: in function 'dofile'
        ...oads/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
        [C]: at 0x00406670

Also, I downloaded the nin model from https://github.com/BVLC/caffe/wiki/Model-Zoo#network-in-network-model All the required files are in the google drive link. https://drive.google.com/folderview?id=0B0IedYUunOQINEFtUi1QNWVhVVU&usp=drive_web

hughperkins commented 9 years ago

Hi Shubhanshu, yes:

missing opencl version of nn.SoftMax
added now
can you luarocks install clnn, and try again please?

hughperkins commented 9 years ago

(oh, for the disparity between cl and cu output, I think it's because 'ceil' actually changes the output size of the max pooling. So, I probably need to implement ceil, if we want the output to be the same between cl and cu)

jcjohnson commented 9 years ago

For neural-style, I don't think exact binary compatibility between cuda and opencl is a strict requirement; tiny differences should be fine as long as the same hyperparameters and inputs produce similar outputs. Of course, exactly matching the cuda outputs would be better.

For other applications though, ceil would be a great addition to cltorch since all of the caffe pretrained models rely on it.

hughperkins commented 9 years ago

For other applications though, ceil would be a great addition to cltorch since all of the caffe pretrained models rely on it.

Ok, good info. Thanks! :-)

hughperkins commented 9 years ago

Hi guys, please note that :ceil() mode is implemented for clnn SpatialMaxPooling layer now. If you luarocks install clnn, you should have access.

hughperkins commented 9 years ago

With :ceil() implemented, I think the results now are much more similar, between cl and cu. I cant quite decide whether the residual differences are because of rounding, of if there is still some small fundamental difference. For 100 iterations, image size 100:

cu: cu_

cl: cl_

Edit: here are the same settings as the earlier images, ie size=200, its=10. no longer an artefact down the right hand side, images look almost identical:

cu: cu_

cl: cl_

(To repeat these, just put torch.manualSeed(123) just after the comment -- Initialize the image, and use geometry and commandline something like:

th neural_style_opencl.lua -style_image examples/inputs/picasso_selfport1907.jpg -content_image examples/inputs/brad_pitt.jpg -gpu 0 -output_image cl_$name.png -image_size $size -model_file models/vgg_normalised.caffemodel -num_iterations $its -save_iter $its -normalize_gradients -content_weight 50000 -style_weight 90000 -backend clnn -optimizer lbfgs

)

Edit 3: using size=200, iterations=1000: cu: cu_ cl: cl_

Not quite the same, but fairly close, I think?

(Edit 4: Hmmm, I suppose an interesting question is: if I take the cu image, and give it to cl, is it a local minimum for cl too? and similarly for cl image giving to cu)

jcjohnson commented 9 years ago

Looks pretty good to me! If you wanted to track down the difference, I'd run for one iteration and dump all activations and gradients to a file and compare between clnn and cunn.

However it looks close enough, so if you want to rebase and clean up for a PR I'm happy to merge.

napsternxg commented 9 years ago

Using the command shown below and the vgg_normalized.caffemodel file I am getting similar output as above. However, the results are not as good as the one using the full model.

$ th neural_style_opencl.lua -style_image examples/inputs/picasso_selfport1907.jpg -content_image examples/inputs/brad_pitt.jpg -output_image profile.png -model_file models/vgg_normalised.caffemodel -gpu 0 -backend clnn -image_size 150  -num_iterations 1000 -normalize_gradients -content_weight 50000 -style_weight 90000

@hughperkins could you get your code to work with the nin_imagenet_conv.caffemodel. I couldn't get it to run. I am still getting the following errors:

[libprotobuf ERROR google/protobuf/text_format.cc:245] Error parsing text-format caffe.NetParameter: 1:4: Message type "caffe.NetParameter" has no field named "net".
Successfully loaded models/nin_imagenet_conv.caffemodel
MODULE data UNDEFINED
warning: module 'data [type 5]' not found

As well as the following:

/home/username/Downloads/torch/install/bin/luajit: ...torch/install/share/lua/5.1/nn/SpatialAveragePooling.lua:14: bad argument #2 to 'SpatialAveragePooling_updateOutput' (input image smaller than kernel size)
stack traceback:
        [C]: in function 'SpatialAveragePooling_updateOutput'
        ...torch/install/share/lua/5.1/nn/SpatialAveragePooling.lua:14: in function 'updateOutput'
        .../Downloads/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
        neural_style_opencl.lua:143: in function 'main'
        neural_style_opencl.lua:418: in main chunk
        [C]: in function 'dofile'
        ...oads/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
        [C]: at 0x00406670

Here is the full command I used and the corresponding processing log and errors.

$ th neural_style_opencl.lua -style_image examples/inputs/picasso_selfport1907.jpg -content_image examples/inputs/brad_pitt.jpg -output_image profile.png -model_file models/nin_imagenet_conv.caffemodel -proto_file models/solver.prototxt -gpu 0 -backend clnn -image_size 150  -num_iterations 1000 -normalize_gradients -content_weight 50000 -style_weight 90000
[libprotobuf ERROR google/protobuf/text_format.cc:245] Error parsing text-format caffe.NetParameter: 1:4: Message type "caffe.NetParameter" has no field named "net".
Successfully loaded models/nin_imagenet_conv.caffemodel
MODULE data UNDEFINED
warning: module 'data [type 5]' not found
conv1: 96 3 11 11
cccp1: 96 96 1 1
cccp2: 96 96 1 1
conv2: 256 96 5 5
cccp3: 256 256 1 1
cccp4: 256 256 1 1
conv3: 384 256 3 3
cccp5: 384 384 1 1
cccp6: 384 384 1 1
conv4-1024: 1024 384 3 3
cccp7-1024: 1024 1024 1 1
cccp8-1024: 1000 1024 1 1
Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
Using OpenCL device: Turks
Apply_1t_1s_0pt_-2_*out = val1 build log: 
"/tmp/OCL14862T5.cl", line 53: warning: variable "thisLinearId" was declared
          but never referenced
        int thisLinearId;
            ^

Apply_1t_0s_0pt_-2_*out = (*out > 0) ? *out : 0 build log: 
"/tmp/OCL14862T19.cl", line 49: warning: variable "thisLinearId" was declared
          but never referenced
        int thisLinearId;
            ^

Apply_1t_1s_0pt_-2_*out *= val1 build log: 
"/tmp/OCL14862T26.cl", line 53: warning: variable "thisLinearId" was declared
          but never referenced
        int thisLinearId;
            ^

Apply_2t_0s_0pt_-2_-2_*out -= *in1 build log: 
"/tmp/OCL14862T29.cl", line 56: warning: variable "thisLinearId" was declared
          but never referenced
        int thisLinearId;
            ^

Apply_1t_1s_0pt_-2_*out = pown(*out, val1) build log: 
"/tmp/OCL14862T32.cl", line 53: warning: variable "thisLinearId" was declared
          but never referenced
        int thisLinearId;
            ^

THClReduceAll.cl build log: 
"/tmp/OCL14862T38.cl", line 9: warning: variable "in1" was declared but never
          referenced
    float *in1 = &_in1;
           ^

"/tmp/OCL14862T38.cl", line 10: warning: variable "out" was declared but never
          referenced
    float *out = &_out;
           ^

/tmp/luarocks_clnn-scm-1-9416/clnn/SpatialMaxPooling.cpp build log: 
"/tmp/OCL14862T46.cl", line 24: warning: a value of type
          "const __global float *" cannot be used to initialize an entity of
          type "__global float *"
    global Dtype *bottom_data = bottom_data_data + bottom_data_offset;
                                ^

Apply_2t_0s_0pt_-2_-2_*out *= *in1 build log: 
"/tmp/OCL14862T61.cl", line 56: warning: variable "thisLinearId" was declared
          but never referenced
        int thisLinearId;
            ^

/home/username/Downloads/torch/install/bin/luajit: ...torch/install/share/lua/5.1/nn/SpatialAveragePooling.lua:14: bad argument #2 to 'SpatialAveragePooling_updateOutput' (input image smaller than kernel size)
stack traceback:
        [C]: in function 'SpatialAveragePooling_updateOutput'
        ...torch/install/share/lua/5.1/nn/SpatialAveragePooling.lua:14: in function 'updateOutput'
        .../Downloads/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
        neural_style_opencl.lua:143: in function 'main'
        neural_style_opencl.lua:418: in main chunk
        [C]: in function 'dofile'
        ...oads/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
        [C]: at 0x00406670

jcjohnson / neural-style

OpenCL support #44