hughperkins / clnn

OpenCL backend for Torch nn neural networks library
BSD 2-Clause "Simplified" License
125 stars 16 forks source link

missing implementations on SpatialMaxPooling_updateGradInput #12

Closed brunoro closed 9 years ago

brunoro commented 9 years ago

I'm porting some cunn code over to clnn and stumbled over the following error:

/Users/brunoro/dev/torch/install/bin/luajit: ...dev/torch/install/share/lua/5.1/nn/SpatialMaxPooling.lua:41: Not implemented at /Users/brunoro/dev/clnn/SpatialMaxPooling.cpp:166
stack traceback:
    [C]: in function 'SpatialMaxPooling_updateGradInput'
    ...dev/torch/install/share/lua/5.1/nn/SpatialMaxPooling.lua:41: in function 'updateGradInput'
    ...rs/brunoro/dev/torch/install/share/lua/5.1/nn/Module.lua:30: in function 'backward'
    ...runoro/dev/torch/install/share/lua/5.1/nn/Sequential.lua:84: in function 'backward'
    neural_style.lua:259: in function 'opfunc'
    .../brunoro/dev/torch/install/share/lua/5.1/optim/lbfgs.lua:66: in function 'lbfgs'
    neural_style.lua:278: in function 'main'
    neural_style.lua:439: in main chunk
    [C]: in function 'dofile'
    .../dev/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
    [C]: at 0x010a4d1190

Browsing through SpatialMaxPooling.cpp I found out that there's actually some commented code just above the line that throws that exception. Is there any plans on implementing those cases?

Also, some CNN noob questions: what those cases stand for? I'd be happy to implement those if anyone points out some reference to what actually this method is doing.

hughperkins commented 9 years ago

I'm porting some cunn code over to clnn

Awesome!

Is there any plans on implementing those cases?

I sort of implement stuff as and when it becomes necessary. If you have a moment to implement one of the commented out cases that would be great! :-)

Also, some CNN noob questions: what those cases stand for?

In max-pooling, we take a square of pooling width (kW) by pooling height (kH) pixels, and find the maximum value in that grid. The output from that grid is this max value. Then, we take the next square of pixels, and do the same thing. Useful web pages: http://ufldl.stanford.edu/wiki/index.php/Pooling and http://andrew.gibiansky.com/blog/machine-learning/convolutional-neural-networks/

When we back-propagate, we take the gradient from the gradOutput pixel, and send it back to the one that produced the max value on the way forwards. So we have to store which pixel that was, when we do the forwards propagation.

The pooling areas can be exactly contiguous, which is the easy situation. In this case each gradOutput value maps to exactly one gradInput pixel. Well, that's always true. But the point is, each gradInput pixel will take input from exactly 0 or 1 gradOutput pixels. But rather than making the pooling areas contiguous, we can make them overlap. And then some gradInput pixels need to be updated from the sum of several gradOutput pixels. This is a bit tricky to do, if done from parallel threads, so we consider this case separately. The stride, dW and dH decides the horizontal and vertical distance between each pooling area (kW by kH). When dW == kW and dH == kH, this is the contiguous case. When dW is less than kW, or dH is less than kH, then we have overlapping pools.

Actually, you might not need to handle this, because:

if (input->nDimension == 3) {

Spatial max pooling is over a 2d image, so that is 2 dimensions, W ('width') and H (height). But typically, there will be more than one image plane for each incoming example, so we have nInputPlane image planes per example, three dimensions. Finally, we can provide a mini-batch of multiple examples, so that would be 4 dimensions.

Currently, implementation is done for mini-batches, but not for single non-batched examples.

As far as this bit:

} else if((kW>>1) <= dW && (kH>>1) <= dH) {

To handle overlapping pools, I noticed that mostly we only have 3 x 3 pools, with 2 x 2 stride. So, as a first simplification, I only consider this case. Actually, let's consider an even simpler case for now, which is an input image with one single row, with 1 x 3 pools, and 1 x 2 stride. In this case, the pools overlap horizontally:

|pool 1|
     |pool 2|
          |pool 3|
              ... etc

But no more than 2 pools overlap for any input/gradInput pixel. So, we can do the pooling in two batches. In the first batch, we do the odd pools:

|pool 1|
          |pool 3|

These dont overlap :-)

Then we do the even pools:

     |pool 2|
               |pool 4|

... dont overlap either. And we add their results to the gradInput results from the first batch of pools.

In 2 dimensions, we'll need 4 such batches, like we will have these pools first:

X . X . X .
. . . . . .
X . X . X .
. . . . . .

(where 'X' are the pools we are calculating, and '.' are the ones we skip for now)

Next batch will be:

. X . X . X
. . . . . . 
. X . X . X
. . . . . .

Then:

. . . . . .
X . X . X .
. . . . . .
X . X . X .

Finally

. . . . . .
. X . X . X
. . . . . . 
. X . X . X

So, 4 times.

This will actually generalize to any case where the pooling size is no smaller than half the stride, or something like this, hence that weird looking if condition earlier.

It would be easy enough to handle the fully general case though. eg, if stride is 1, and pooling size is 3, then we'd simply need to do the backpropagation 9 times, adding the results to the output of the earlier updateGradInput batches.

fmassa commented 9 years ago

@hughperkins nn and cunn SpatialMaxPooling (and soon SpatialAveragePooling) were updated to support arbitrary padding pad_w and pad_h. Also, the CUDA kernels were changed (borrowed from Caffe), and now there's no more need for atomic operations on the backward case. I have absolutely no knowledge in opencl, but I think that adapting those kernels would be easier to generalise SpatialMaxPooling to all the cases.

hughperkins commented 9 years ago

@fmassa Ah, good info. Thanks!

hughperkins commented 9 years ago

Hi Gustavo,

Several people in the neural-style project encountered the same issue, so I've copy/pasted an implementation for contiguous non-batched pools, and if you update your clnn, luarocks install clnn, it might do what you need. If you're using non-contiguous pools, you'll need to either do some extra copy/pasting (or even, ideally, some factorization :-) ), or else follow Francisco's heads-up that the cunn implementation might be worth re-porting over now. I reckon the path of least resistance for now would be to just add an extra copy-paste block :-) Or perhaps factorize a bit.

In terms of testing clnn, I basically run the following tests currently, on a single OS (ubuntu 14.04), and a single GPU (NVIDIA 940M). This is quite quick to do, so please feel free to change things in whatever way you think is beautiful :-) Ideally not diverging too much from cunn, but I've changed tons of stuff, so fairly flexible. The tests I do are:

th -l clnn -e 'clnn.test()'
-- all tests should pass
git clone git@github.com:karpathy/char-rnn.git
cd char-rnn
th -opencl 1 train.lua
-- training loss should decrease from 3-4 ish to about 2-3 ish, shouldnt become NaN, shouldnt crash.  About 5 iterations are sufficient
brunoro commented 9 years ago

Hi @hughperkins, thanks for the detailed explanation. Cool, one of the pieces of code I was trying to get running with clnn was actually neural-style, so thanks for pointing out the issue on their repo.

I'll try this week to get it running with some copy/pasting (and or factorization). Otherwise, re-porting the cunn implementation might be a nice way to brush up my openCL :D

hughperkins commented 9 years ago

Cool :-) By the way, if you re-port cunn code, you might want to try the following:

git clone git@github.com:torch/cunn.git
git clone git@github.com:hughperkins/clnn.git
cd clnn
python util/port.py
meld port . &

This will show you the diff between automatically ported cunn files (in port directory), and the current clnn files (in . directory). It's far from perfect, but it can provide useful first draft to work from.

hughperkins commented 9 years ago

Hi Gustavo, please note that cunn spatialmaxpooling has been ported across now, since :ceil() needed by neural-styling. You can luarocks install clnn to pull down latest version.

brunoro commented 9 years ago

Oh, sweet. I was in the middle of the process porting the kernel from cunn, so I guess I don't need to finish that.

hughperkins commented 9 years ago

Ok. I can close this issue, right?

brunoro commented 9 years ago

Yep, thanks a lot!