How to run a convolution forwardprop, input gradient, weights gradient?

hughperkins commented 8 years ago

Assume I have a cl_mem for input, weights, gradOutput, output, gradInput, gradWeights, an opencl queue, probably an opencl context. And metadata representing tensor dimensions. What to do next?

naibaf7 commented 8 years ago

@hughperkins The best example on usage I can give you at the moment is here: https://github.com/naibaf7/caffe/blob/master/src/caffe/layers/libdnn_conv_layer.cpp (note that the last part of the code, the "tune" method is experimental at this stage).

hughperkins commented 8 years ago

Ok. top is input, and bottom is output? Presumably they are both NCHW? What about layout of weights? I guess, without thinking too hard, bias is just Co, where Co is 'output channels'?

naibaf7 commented 8 years ago

The layout is NCHW, or NC + arbitrary amount of spatial dimensions.
The weights are in format C_out, C_in, H_kernel, W_kernel, bias is of size C_out.
Bottom data is input (for foward) and top data is output (for forward).
The direction is reversed for backward/gradient pass (from top to bottom).

hughperkins commented 8 years ago

Hmmm, I think I might plug it into https://github.com/hughperkins/neon-benchmarks first, since it both measures performance, and checks correctness. However, needs to be callable from python. Probably easiest would be to use ffi, via a C interface? I might just write a C wrapper myself, should be relatively straightforward?

naibaf7 commented 8 years ago

@hughperkins Yeah, if you can copy the data into cl_mem and create a valid ViennaCL context from your OpenCL context (methods for that are included in libdnn) it should be rather easy to test.

hughperkins commented 8 years ago

A ViennaCL context? I already have an opencl context created by PyOpenCL. Can I use that? Does using the viennacl context imply that the queues will be different, or can I use the same queue on both the PyOpenCL and ViennaCL sides?

naibaf7 commented 8 years ago

@hughperkins This should help you:

#ifdef USE_OPENCL
void device::setupViennaCLContext(int id, const &cl_context ctx, const &cl_device dev, const &cl_command_queue queue) {
  viennacl::ocl::setup_context(id, ctx, dev, queue);
}
#endif

creates a ViennaCL context with the provided id, OpenCL context,device and queue that already exist in your PyOpenCL. (https://github.com/naibaf7/libdnn/blob/master/src/device.cpp)

hughperkins commented 8 years ago

Excellent! Sounds good. Will take a look. Will close the issue for now :-)

hughperkins commented 8 years ago

Hi. Ok, I'm being a bit slow I know. Ok, so far what I have is:

build.sh, this is what I run to do the build:

#!/bin/bash

echo '###############'
gcc -std=c++11 -Iinclude -Ithirdparty/ViennaCL-1.7.1 -o test test.cpp

This file is in root of cloned 'libdnn' repo.

Note that viennacl is in thirdparty/ViennaCL-1.7.1 subdirectory, of libdnn repo.

test.cpp:

#include <algorithm>
#include <vector>
#include "CL/cl.h"

#define USE_OPENCL
#include "libdnn.hpp"
using namespace std;
using namespace greentea;

// #ifdef USE_OPENCL
// void device::setupViennaCLContext(int id, const &cl_context ctx, const &cl_device dev, const &cl_command_queue queue) {
//   viennacl::ocl::setup_context(id, ctx, dev, queue);
// }
// #endif

#define Dtype float

int main(int argc, char *argv[]) {
    cl_int err;
    cl_platform_id platform = 0;
    cl_device_id cl_device = 0;
    cl_context_properties props[3] = { CL_CONTEXT_PLATFORM, 0, 0 };
    cl_context ctx = 0;
    cl_command_queue queue = 0;

    err = clGetPlatformIDs(1, &platform, NULL);
    if (err != CL_SUCCESS) {
        printf( "clGetPlatformIDs() failed with %d\n", err );
        return 1;
    }
    cout << "got platforms" << endl;

    err = clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &cl_device, NULL);
    if (err != CL_SUCCESS) {
        printf( "clGetDeviceIDs() failed with %d\n", err );
        return 1;
    }

    props[1] = (cl_context_properties)platform;
    ctx = clCreateContext(props, 1, &cl_device, NULL, NULL, &err);
    if (err != CL_SUCCESS) {
        printf( "clCreateContext() failed with %d\n", err );
        return 1;
    }

    queue = clCreateCommandQueue(ctx, cl_device, 0, &err);
    if (err != CL_SUCCESS) {
        printf( "clCreateCommandQueue() failed with %d\n", err );
        clReleaseContext(ctx);
        return 1;
    }

    cl_mem bufA, bufB, bufC;
    bufA = clCreateBuffer(ctx, CL_MEM_READ_WRITE, 10240,
                          NULL, &err);
    bufB = clCreateBuffer(ctx, CL_MEM_READ_WRITE, 10240,
                          NULL, &err);
    bufC = clCreateBuffer(ctx, CL_MEM_READ_WRITE, 10240,
                          NULL, &err);

    int id = 123;
    device::setupViennaCLContext(id, ctx, cl_device, queue);
    device mydevice;

    shared_ptr<LibDNNConv<Dtype> > libdnn_;

    LibDNNConfig config;
    config.dev_ptr = mydevice;
    config.in_shape = std::vector<int_tp>(3, 1);
    config.out_shape = std::vector<int_tp>(3, 1);
    config.kernel = std::vector<int_tp>(1, 1);
    config.pad = std::vector<int_tp>(1, 0);
    config.stride = std::vector<int_tp>(1, 1);
    config.dilation = std::vector<int_tp>(1, 0);
    config.group = 1;
    config.bias_term = false;
    config.fast_unsafe_math = false;
    config.weights_backward = bufA;
    config.bias_backward = bufB;

    //if (std::is_same<Dtype, float>::value ||
      //  this->device_->CheckCapability("cl_khr_int64_base_atomics")) {
      config.wgalgo = LIBDNN_CONVOLUTION_WG_ALGO_ATOMIC;
      config.bwalgo = LIBDNN_CONVOLUTION_BW_ALGO_COL2IM_ATOMIC;
    //} else {
    //  config.wgalgo = LIBDNN_CONVOLUTION_WG_ALGO_DIRECT;
    //  config.bwalgo = LIBDNN_CONVOLUTION_BW_ALGO_IM2COL;
    //}

    LibDNNConv<Dtype>* libdnn = new LibDNNConv<Dtype>(config);
    libdnn_.reset(libdnn);

    //delete libdnn;

    return 0;
}

REsult of running build.sh :

ubuntu@peach:~/torch-cl/opencl/libdnn$ ./build.sh 
###############
In file included from include/libdnn.hpp:7:0,
                 from test.cpp:6:
include/device.hpp:24:51: error: ISO C++ forbids declaration of ‘cl_context’ with no type [-fpermissive]
   static void setupViennaCLContext(int id, const &cl_context ctx, const &cl_dev
                                                   ^
include/device.hpp:24:62: error: expected ‘,’ or ‘...’ before ‘ctx’
   static void setupViennaCLContext(int id, const &cl_context ctx, const &cl_dev
                                                              ^
include/device.hpp:51:3: error: ‘viennacl’ does not name a type
   viennacl::ocl::program ocl_program_;
   ^
test.cpp: In function ‘int main(int, char**)’:
test.cpp:63:59: error: no matching function for call to ‘greentea::device::setupViennaCLContext(int&, _cl_context*&, _cl_device_id*&, _cl_command_queue*&)’
     device::setupViennaCLContext(id, ctx, cl_device, queue);
                                                           ^
In file included from include/libdnn.hpp:7:0,
                 from test.cpp:6:
include/device.hpp:24:15: note: candidate: static void greentea::device::setupViennaCLContext(int, const int&)
   static void setupViennaCLContext(int id, const &cl_context ctx, const &cl_dev
               ^
include/device.hpp:24:15: note:   candidate expects 2 arguments, 4 provided
test.cpp:69:20: error: cannot convert ‘greentea::device’ to ‘greentea::device*’ in assignment
     config.dev_ptr = mydevice;
                    ^

Thoughts? I know I'm kind of swimming here...

naibaf7 commented 8 years ago

@hughperkins OK, so, these things:

You should set the ID of the first device to 0, not 123 ;)
You also need to create a device* to pass to libdnn. Use the proper constructor, not empty initialization.
The config "weights_backward" and "bias_backward" should be true/false flags, not the buffers themselves.
dev_ptr is a pointer, so, make sure the device is persistent and use &device
&cl_context ctx should just be &ctx
The viennacl seems to be not included correctly...

hughperkins commented 8 years ago

Note that the first fatal error is simply because of including libdnn.hpp :

#define USE_OPENCL
#include "libdnn.hpp"

Causes:

In file included from include/libdnn.hpp:7:0,
                 from test.cpp:6:
include/device.hpp:24:51: error: ISO C++ forbids declaration of ‘cl_context’ with no type [-fpermissive]
   static void setupViennaCLContext(int id, const &cl_context ctx, const &cl_dev
                                                   ^
include/device.hpp:24:62: error: expected ‘,’ or ‘...’ before ‘ctx’
   static void setupViennaCLContext(int id, const &cl_context ctx, const &cl_dev
                                                              ^
include/device.hpp:51:3: error: ‘viennacl’ does not name a type
   viennacl::ocl::program ocl_program_;

Dont suppose... could you write a short complete example of taking two (empty) cl_mem buffers, one representing weights, and one represneting input, and convolving them?

naibaf7 commented 8 years ago

@hughperkins Yep I will do that when I find time to it... a lot going on at the moment:

GTX 1080 CUDA 8.0 has verification fails in CUDA but not OpenCL, so I need to analyze this bug
Intel i7-6560U Iris Pro Graphics arrived and I need to get this tested for Intel
Working on faster libDNN kernels and FP16/INT8 support for Caffe
Rewriting PyGreentea Python interface for my research project on fly brain EM segmentation.

So you see, I'm quite in a mess considering I have to do that all alone :)

bhack commented 8 years ago

With GSoC we can help you to port kernels from caffe. /cc @edgarriba

edgarriba commented 8 years ago

Once libdnn package exporting is working we can start with that. On Jul 22, 2016 18:56, "bhack" notifications@github.com wrote:

With GSoC we can help you to port kernels from caffe. /cc @edgarriba https://github.com/edgarriba

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/naibaf7/libdnn/issues/7#issuecomment-234597243, or mute the thread https://github.com/notifications/unsubscribe-auth/AE6w66MvHliBeouwTP_GdjeZWqz-YMfMks5qYPYngaJpZM4JPVxW .

hughperkins commented 8 years ago

Question: possible to migrate your convnet-benchmarks PR to work with libdnn? Then I can probably just base off of that.

naibaf7 commented 8 years ago

@hughperkins What do you mean by that? Do you need example usage code? The two go-tos here that work now are tiny-cnn and Caffe. https://github.com/edgarriba/tiny-cnn

The convnet-benchmark for libdnn is done through Caffe.

hughperkins commented 8 years ago

I looked through tiny-cnn, but couldnt quite figure out where to find the magic bits of code that do things like:

creating the appropriate buffers that libdnn can read?
sending the buffers to libdnn?
any additional initialization/configuration and so on that need sto be sent to libdnn?

edgarriba commented 8 years ago

@hughperkins take a look here https://github.com/tiny-dnn/tiny-dnn/blob/master/tiny_dnn/core/kernels/conv2d_op_libdnn.h#L69

It's a minimal example that needs a further optimization when setting the batch size and transferring the cl_mem to next operations. But at least it works.

hughperkins commented 8 years ago

Ok, this compiles and runs, so:

viennacl included correctly
libdnn included and linked ok
can create the greentea context ok

Next up, create the buffers. I guess I need to create some kind of viennacl iterator-type objects?

hughperkins commented 8 years ago

oh maybe the buffers are just cl_mems:

          kernel(WrapHandle((cl_mem)bottom_data, &ctx),
                 WrapHandle((cl_mem)weight, &ctx),
                 WrapHandle((cl_mem)bias, &ctx),
                 WrapHandle((cl_mem)top_data, &ctx)),

so, Dtype should be ... if bottom_data is both a const Dtype * and a cl_mem, then Dtype * is cl_mem, and Dtype is ... ummm....well it's not a pointer to cl_mem, since that'd be the wrong way around, so ... ???? ... .ok, in cl.h, cl_mem is:

typedef struct _cl_mem *            cl_mem;

So, Dtype is struct _cl_mem?

bhack commented 8 years ago

See comments starting from https://github.com/BVLC/caffe/issues/4155#issuecomment-235441995

edgarriba commented 8 years ago

@hughperkins You are right, buffers have to be created as cl_mem, however LibDNN requires const Dtype* which usually is float. So, you need to cast a follows: cl_mem -> void* -> float* https://github.com/tiny-dnn/tiny-dnn/blob/master/tiny_dnn/core/kernels/conv2d_op_libdnn.h#L114 https://github.com/tiny-dnn/tiny-dnn/blob/master/tiny_dnn/core/kernels/conv2d_op_libdnn.h#L191-L199

naibaf7 commented 8 years ago

@edgarriba Exactly right :) (cl_mem = cl_mem_* -> void* -> float*) Of course it's not a real float*, but neither is CUDA when the flat unified address space is not guaranteed (depending on the version). Still not the prettiest solution, but convenient for CUDA and OpenCL interoperability. As long as there is no "universal" standard for virtual pointers across the frameworks, this is the solution that will be used.

bhack commented 8 years ago

Yes as you have seen by the comments I've referenced it is not the most intuitive solution. But seems that there isn't emerged any alternative.

hughperkins commented 8 years ago

Fabian,

Can you fill in lines 71 onwards in my example code? I guess that calling convolutions in your library is relatively straightforward, for someone who knows the library well, so you could filll that in in a couple of minutes or so?

naibaf7 commented 8 years ago

@hughperkins Sure, can you point me to the exact file I have to modify?

hughperkins commented 8 years ago

Sure! This one: https://gist.github.com/hughperkins/b870d4be37003623478c39471841f309

hughperkins commented 8 years ago

(so there are three cl_mems, one for input ,one for filters, one for ouptut, an opencl queue, an opencl device , an opencl context , and the geometry are in the vairables batchSize, size (means width and height, it's square), kernelSize (again, square), planes (assumes input and output planes/channels identical).)

hughperkins commented 8 years ago

(doesnt have to build/compile, I can fix any typos, but as long as it's close enough, and then I can handle any typos and so on; is much easier than actually figuring out exaclty which classes/m ethods to call etc :-) )

naibaf7 / libdnn

How to run a convolution forwardprop, input gradient, weights gradient? #7