[FeatureRequest] port SpatialUpSamplingNearest and SpatialBatchNormalization from cunn

pawni commented 8 years ago

I tried to rebuild something like http://tinyclouds.org/colorize/ and found that SpatialUpSamplingNearest and SpatialBatchNormalization are not available within clnn. I tried porting Upsampling yesterday (https://github.com/pawni/clnn/commit/091882dea693f9500907341f26c1c492a9b76ad8 ). It seems to work but I haven't tested it properly yet. I will continue to have a look at that and the BatchNormalization, however input would be much appreciated. :)

hughperkins commented 8 years ago

Cool. Sounds good :-) Looks like you've already ported it to THNN too? Let me know when you want me to merge.

pawni commented 8 years ago

Yes I did, as I did not know how not to put it into THNN and be able to use it ;) Is there anything I should keep in mind when porting the things or is the current port okay as it is?

hughperkins commented 8 years ago

Ah... hmm... ah :-) Yes, there is a breaking change coming up. Look at https://github.com/torch/nn/pull/601 To see what's going to happen, and the effect:

git clone https://github.com/fmassa/nn.git
cd nn
git checkout THNN_fmassa_2
luarocks make rocks/nn-scm-1.rockspec

Now, try re-running your tests :-)

hughperkins commented 8 years ago

Hi, the THNN changes have been merged in now https://github.com/torch/nn/pull/601#event-554379230

pawni commented 8 years ago

Hi, saw that and tested it. I think THNN isn't the problem but I get some weird values which I need to fix - I need to check when I have some time to figure out wether it is the CL code or somewhere in the c++ where my values are getting corrupted. It seems like I am running over the limit of the data types but not sure though.

hughperkins commented 8 years ago

Ok. Just in case it's useful, what I normally do for testing kernels is:

add some c/c++ code to print out the result of calling a kernel
modify the kernel to not update the output tensors
add an if clause, for one single kernel thread, and use that to write out a bunch of debug/diagnostic info, like:

kernel something(foo, blah, ....) {
  // stuff here

  // out[get_global_id(0)] = foo; // commented out

  if(get_global_id(0) == 0) { // only one thread enters this, ever
    out[0] = 123; // some visible value, check anything is happening at all
    out[1] = b[0]; // find out what is in b[0]
    // etc ...
  }
}

pawni commented 8 years ago

I tried adding your debug code however I can't even see the 123 in the output. Furthermore, it mostly does not change the output but sometimes it seems to put random values into it. Do you have any idea why that might happen? I made the changes here: https://github.com/pawni/clnn/blob/master/lib/THCLNN/SpatialUpSamplingNearest.cpp And there is also a test script testing the module similar to the other ones.

hughperkins commented 8 years ago

(accidentally clicked into this. please paste a new, dummy, message, so it stays in my notifications. i will check after work / at weekend)

pawni commented 8 years ago

(push)

hughperkins commented 8 years ago

oh, it should be k.out(output).

Also, you'll need to call synhronize(), and copy the data back from the gpu to the cpu.

To synchronize is something like:

    cl->finish()

To copy back from gpu to cpu is something like:

THClTensor_wrapper(state, output)->copyToHost();

hughperkins commented 8 years ago

(it might be that changing from k.in to k.out, for hte output, is sufficient to solve the issue. But usually custom kernels require a fair amount of debugging. Depends...

hughperkins commented 8 years ago

(just to be clear, you dont need to synchronize, or copy data back, normally. just if you are debugging. actually since you are using cout to print, it might handle that automatically, I'm not sure)

hughperkins commented 8 years ago

(As for, what is the difference between k.in, and k.out. Basically, k.in will copy data from main memory to the gpu memory before running the kernel, and then do nothing afterwards, as far as that particular value or tensor. k.out will leave the gpu-side buffer as created but not initialized in any particular way, and certainly not with the contents of the cpu-side tensor copied into it. After running the kernel, the data will be copied from gpu into main memory. It's been a while, but I think that the way it does this is by setting a 'dirty' flag, so it wont actually explicitly copy stuff, unless you try to read it and stuff, but it will cause a copy to take place, when you try to read it, as long as you read it through tensor api methods. And there's also inout which does both. transfer in both directions. In the back of my mind, I think that inout is the same as one of in or out in practice, but I cant remember which, so just use the one that matches what should be happening)

pawni commented 8 years ago

That made it an easy fix - I just changed the out and added the finish() and copyToHost() and it seems to work now. Thanks a lot! Shall I create a PR for that?

pawni commented 8 years ago

(see https://github.com/pawni/clnn/commit/7f9a8677d429a6c1376dd3fef3b4783bad6ac553 )

hughperkins commented 8 years ago

I just changed the out and added the finish() and copyToHost() and it seems to work now.

Ok great!

Actually, the finish and copyToHost are only needed for debugging (so you can see the output, and even then I'm not sure they're needed, if you use cout, as you do). You should be able to remove them now, and it should continue to work. Hopefully :-)

pawni commented 8 years ago

Just changed it and it is still working as it should - thanks! :) Do you want me to PR?

hughperkins commented 8 years ago

Yes, please create a PR.

hughperkins commented 8 years ago

Well... I get the following error messages when I run the tests:

SpatialUpSamplingNearest_backward
 Function call failed 
/home/ubuntu/torch/install/share/lua/5.1/nn/THNN.lua:1045: OpenCL error, code: CL_INVALID_ARG_SIZE at /home/ubuntu/git/cltorch/src/lib/THClKernels.cpp:27
stack traceback:
    [C]: in function 'v'
    /home/ubuntu/torch/install/share/lua/5.1/nn/THNN.lua:1045: in function 'SpatialUpSamplingNearest_updateOutput'
    ...ch/install/share/lua/5.1/nn/SpatialUpSamplingNearest.lua:50: in function 'forward'
    ...tall/share/lua/5.1/clnn/testSpatialUpSamplingNearest.lua:123: in function 'v'
    /home/ubuntu/torch/install/share/lua/5.1/clnn/test.lua:2619: in function </home/ubuntu/torch/install/share/lua/5.1/clnn/test.lua:2617>
    [C]: in function 'xpcall'
    /home/ubuntu/torch/install/share/lua/5.1/torch/Tester.lua:115: in function 'pcall'
    /home/ubuntu/torch/install/share/lua/5.1/torch/Tester.lua:186: in function '_run'
    /home/ubuntu/torch/install/share/lua/5.1/torch/Tester.lua:161: in function 'run'
    /home/ubuntu/torch/install/share/lua/5.1/clnn/test.lua:2658: in function 'test'
    (command line):1: in main chunk
    [C]: at 0x00406670

--------------------------------------------------------------------------------
SpatialUpSamplingNearest_forward_batch
 Function call failed 
/home/ubuntu/torch/install/share/lua/5.1/nn/THNN.lua:1045: OpenCL error, code: CL_INVALID_ARG_SIZE at /home/ubuntu/git/cltorch/src/lib/THClKernels.cpp:27
stack traceback:
    [C]: in function 'v'
    /home/ubuntu/torch/install/share/lua/5.1/nn/THNN.lua:1045: in function 'SpatialUpSamplingNearest_updateOutput'
    ...ch/install/share/lua/5.1/nn/SpatialUpSamplingNearest.lua:50: in function 'forward'
    ...tall/share/lua/5.1/clnn/testSpatialUpSamplingNearest.lua:36: in function 'v'
    /home/ubuntu/torch/install/share/lua/5.1/clnn/test.lua:2619: in function </home/ubuntu/torch/install/share/lua/5.1/clnn/test.lua:2617>
    [C]: in function 'xpcall'
    /home/ubuntu/torch/install/share/lua/5.1/torch/Tester.lua:115: in function 'pcall'
    /home/ubuntu/torch/install/share/lua/5.1/torch/Tester.lua:186: in function '_run'
    /home/ubuntu/torch/install/share/lua/5.1/torch/Tester.lua:161: in function 'run'
    /home/ubuntu/torch/install/share/lua/5.1/clnn/test.lua:2658: in function 'test'
    (command line):1: in main chunk
    [C]: at 0x00406670

Thoughts?

hughperkins commented 8 years ago

Probably no_elements should be an int, to match what you are feeding to k.in, and because pretty much everything in clnn is ints for now (no-one has ever complained once about this, as far as i remember, so ints they stay for now :-) )

pawni commented 8 years ago

Weird - it runs fine for me.. what are the versions of your other libraries / environment so that I can try to replicate? But I also changed the long to int (just kept it from the conversion ;) ) so you could try again

hughperkins commented 8 years ago

Ok. Seems like there is a bunch of stuff missing from my CMakeLists.txt too, so cog isnt running. All that stuff in quotation marks at the bottom of the .cpp file should be generated automatically by cog.

hughperkins commented 8 years ago

hmmm, mine still fails actually. I'll push where I am to a branch.

hughperkins commented 8 years ago

Here's the changes I've made: https://github.com/pawni/clnn/compare/master...hughperkins:pawni-master?expand=1 You should be able to run 'cog' to update the bits in quotation marks at the bottom of the cpp files by doing:

cd build
ccmake ..
flick 'DEV_RUN_COG' to 'ON' (you'll need python in the path)
'c' for configure, 'g' for generate'
cd ..
luarocks make clnn-scm-1.rockspec => should now copy the contents of the .cl files into the .cpp files, whenever the .cl files are updated

hughperkins commented 8 years ago

Weird - it runs fine for me.. what are the versions of your other libraries / environment so that I can try to replicate? But I also changed the long to int (just kept it from the conversion ;) ) so you could try again

Well, the specific error CL_ARG_SIZE normally means something like there is a mismatch between the arguments you are passing to k.in and the arguments expected by the kernel, in the .cl file. It could be gpu-specific plausibly, but shouldnt depend on other libraires and stuff, I think.

I'm using an NVIDIA 940M, on Ubuntu 15.10 64-bit.

hughperkins commented 8 years ago

Oh wait, changing from long to int fixed forward, but back is still broken. I will do the same thing in backward.

hughperkins commented 8 years ago

Ok, passing now. I've merged to master: https://github.com/hughperkins/clnn/commits/master

pawni commented 8 years ago

awesome, thanks!

hughperkins commented 8 years ago

Thank you very much Nick

pawni commented 8 years ago

Thank you for the support! I hope that I'll also find time to get my head around the batch normalisation but that looked a bit harder than the up sampling - I'll keep you posted about it.

hughperkins commented 8 years ago

Ok, cool. Sounds good :-) Batch normalization would be very useful :-)

hughperkins commented 8 years ago

Oh... SpatialBatchNormalization does look quite challenging... the good news is, no thrust. thrust is basically not directly portable, needs a bunch of creativity.

SpatialBatchNormalization contains a zillion kernels, but they should be all fairly directly portable. Probably a bunhc of work though. All those bits with triple angle brackets are CUDA kernel launches, eg:

SpatialBatchNormalizationBackward_kernel<8>
      <<<blocks, threads, 0, s>>>
      (input, gradOutput, gradInput, gradWeight, gradBias, weight,
       saveMean, saveStd, scale);

The kernel itself is higher up in the same file, and it's a template, with parameters.

OpenCL itself doesnt handle templates. OpenCL is essentially C99, whereas templates are from c++. What I'm doing is using lua as a templating language.

You can see an example by looking at the CUDA implementation of THCApply.cuh, in cutorch, and comparing it to the OpenCL version, ie compare https://github.com/torch/cutorch/blob/master/lib/THC/THCApply.cuh with https://github.com/hughperkins/cltorch/blob/master/src/lib/THClApply.cpp and https://github.com/hughperkins/cltorch/blob/master/src/lib/THClApply.cl

The stuff with curly brackets {% ... %} is templating, in the style of Jinja2, but using {% ... %}, instead of {{ ... }}.

hughperkins commented 8 years ago

Just noticed that SpatialBatchNormaliation used to be working, and was broken in https://github.com/torch/nn/pull/665 and an earlier commit. Fixed it now. Should be working again now?

hughperkins commented 8 years ago

Hi, I decided to take the radical step of rolling back torch and cltorch to aorund 21 February, prior to a bunch of the THNN changes, which were causes entropy at a rate faster than I could handle. This has the downside that thats just slightly before your PR for spatialupsamplingnearest. So you might need to resubmit the PR for that please.

hughperkins commented 8 years ago

Should all be working again now, I think?

hughperkins / clnn

[FeatureRequest] port SpatialUpSamplingNearest and SpatialBatchNormalization from cunn #31