Closed hughperkins closed 8 years ago
@hughperkins The best example on usage I can give you at the moment is here: https://github.com/naibaf7/caffe/blob/master/src/caffe/layers/libdnn_conv_layer.cpp (note that the last part of the code, the "tune" method is experimental at this stage).
Ok. top
is input, and bottom
is output? Presumably they are both NCHW
? What about layout of weights? I guess, without thinking too hard, bias is just Co
, where Co
is 'output channels'?
Hmmm, I think I might plug it into https://github.com/hughperkins/neon-benchmarks first, since it both measures performance, and checks correctness. However, needs to be callable from python. Probably easiest would be to use ffi, via a C interface? I might just write a C wrapper myself, should be relatively straightforward?
@hughperkins Yeah, if you can copy the data into cl_mem and create a valid ViennaCL context from your OpenCL context (methods for that are included in libdnn) it should be rather easy to test.
A ViennaCL context? I already have an opencl context created by PyOpenCL. Can I use that? Does using the viennacl context imply that the queues will be different, or can I use the same queue on both the PyOpenCL
and ViennaCL
sides?
@hughperkins This should help you:
#ifdef USE_OPENCL
void device::setupViennaCLContext(int id, const &cl_context ctx, const &cl_device dev, const &cl_command_queue queue) {
viennacl::ocl::setup_context(id, ctx, dev, queue);
}
#endif
creates a ViennaCL context with the provided id, OpenCL context,device and queue that already exist in your PyOpenCL. (https://github.com/naibaf7/libdnn/blob/master/src/device.cpp)
Excellent! Sounds good. Will take a look. Will close the issue for now :-)
Hi. Ok, I'm being a bit slow I know. Ok, so far what I have is:
build.sh, this is what I run to do the build:
#!/bin/bash
echo '###############'
gcc -std=c++11 -Iinclude -Ithirdparty/ViennaCL-1.7.1 -o test test.cpp
This file is in root of cloned 'libdnn' repo.
Note that viennacl is in thirdparty/ViennaCL-1.7.1 subdirectory, of libdnn repo.
test.cpp:
#include <algorithm>
#include <vector>
#include "CL/cl.h"
#define USE_OPENCL
#include "libdnn.hpp"
using namespace std;
using namespace greentea;
// #ifdef USE_OPENCL
// void device::setupViennaCLContext(int id, const &cl_context ctx, const &cl_device dev, const &cl_command_queue queue) {
// viennacl::ocl::setup_context(id, ctx, dev, queue);
// }
// #endif
#define Dtype float
int main(int argc, char *argv[]) {
cl_int err;
cl_platform_id platform = 0;
cl_device_id cl_device = 0;
cl_context_properties props[3] = { CL_CONTEXT_PLATFORM, 0, 0 };
cl_context ctx = 0;
cl_command_queue queue = 0;
err = clGetPlatformIDs(1, &platform, NULL);
if (err != CL_SUCCESS) {
printf( "clGetPlatformIDs() failed with %d\n", err );
return 1;
}
cout << "got platforms" << endl;
err = clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &cl_device, NULL);
if (err != CL_SUCCESS) {
printf( "clGetDeviceIDs() failed with %d\n", err );
return 1;
}
props[1] = (cl_context_properties)platform;
ctx = clCreateContext(props, 1, &cl_device, NULL, NULL, &err);
if (err != CL_SUCCESS) {
printf( "clCreateContext() failed with %d\n", err );
return 1;
}
queue = clCreateCommandQueue(ctx, cl_device, 0, &err);
if (err != CL_SUCCESS) {
printf( "clCreateCommandQueue() failed with %d\n", err );
clReleaseContext(ctx);
return 1;
}
cl_mem bufA, bufB, bufC;
bufA = clCreateBuffer(ctx, CL_MEM_READ_WRITE, 10240,
NULL, &err);
bufB = clCreateBuffer(ctx, CL_MEM_READ_WRITE, 10240,
NULL, &err);
bufC = clCreateBuffer(ctx, CL_MEM_READ_WRITE, 10240,
NULL, &err);
int id = 123;
device::setupViennaCLContext(id, ctx, cl_device, queue);
device mydevice;
shared_ptr<LibDNNConv<Dtype> > libdnn_;
LibDNNConfig config;
config.dev_ptr = mydevice;
config.in_shape = std::vector<int_tp>(3, 1);
config.out_shape = std::vector<int_tp>(3, 1);
config.kernel = std::vector<int_tp>(1, 1);
config.pad = std::vector<int_tp>(1, 0);
config.stride = std::vector<int_tp>(1, 1);
config.dilation = std::vector<int_tp>(1, 0);
config.group = 1;
config.bias_term = false;
config.fast_unsafe_math = false;
config.weights_backward = bufA;
config.bias_backward = bufB;
//if (std::is_same<Dtype, float>::value ||
// this->device_->CheckCapability("cl_khr_int64_base_atomics")) {
config.wgalgo = LIBDNN_CONVOLUTION_WG_ALGO_ATOMIC;
config.bwalgo = LIBDNN_CONVOLUTION_BW_ALGO_COL2IM_ATOMIC;
//} else {
// config.wgalgo = LIBDNN_CONVOLUTION_WG_ALGO_DIRECT;
// config.bwalgo = LIBDNN_CONVOLUTION_BW_ALGO_IM2COL;
//}
LibDNNConv<Dtype>* libdnn = new LibDNNConv<Dtype>(config);
libdnn_.reset(libdnn);
//delete libdnn;
return 0;
}
REsult of running build.sh :
ubuntu@peach:~/torch-cl/opencl/libdnn$ ./build.sh
###############
In file included from include/libdnn.hpp:7:0,
from test.cpp:6:
include/device.hpp:24:51: error: ISO C++ forbids declaration of ‘cl_context’ with no type [-fpermissive]
static void setupViennaCLContext(int id, const &cl_context ctx, const &cl_dev
^
include/device.hpp:24:62: error: expected ‘,’ or ‘...’ before ‘ctx’
static void setupViennaCLContext(int id, const &cl_context ctx, const &cl_dev
^
include/device.hpp:51:3: error: ‘viennacl’ does not name a type
viennacl::ocl::program ocl_program_;
^
test.cpp: In function ‘int main(int, char**)’:
test.cpp:63:59: error: no matching function for call to ‘greentea::device::setupViennaCLContext(int&, _cl_context*&, _cl_device_id*&, _cl_command_queue*&)’
device::setupViennaCLContext(id, ctx, cl_device, queue);
^
In file included from include/libdnn.hpp:7:0,
from test.cpp:6:
include/device.hpp:24:15: note: candidate: static void greentea::device::setupViennaCLContext(int, const int&)
static void setupViennaCLContext(int id, const &cl_context ctx, const &cl_dev
^
include/device.hpp:24:15: note: candidate expects 2 arguments, 4 provided
test.cpp:69:20: error: cannot convert ‘greentea::device’ to ‘greentea::device*’ in assignment
config.dev_ptr = mydevice;
^
Thoughts? I know I'm kind of swimming here...
@hughperkins OK, so, these things:
&device
&cl_context ctx
should just be &ctx
Note that the first fatal error is simply because of including libdnn.hpp :
#define USE_OPENCL
#include "libdnn.hpp"
Causes:
In file included from include/libdnn.hpp:7:0,
from test.cpp:6:
include/device.hpp:24:51: error: ISO C++ forbids declaration of ‘cl_context’ with no type [-fpermissive]
static void setupViennaCLContext(int id, const &cl_context ctx, const &cl_dev
^
include/device.hpp:24:62: error: expected ‘,’ or ‘...’ before ‘ctx’
static void setupViennaCLContext(int id, const &cl_context ctx, const &cl_dev
^
include/device.hpp:51:3: error: ‘viennacl’ does not name a type
viennacl::ocl::program ocl_program_;
Dont suppose... could you write a short complete example of taking two (empty) cl_mem buffers, one representing weights, and one represneting input, and convolving them?
@hughperkins Yep I will do that when I find time to it... a lot going on at the moment:
So you see, I'm quite in a mess considering I have to do that all alone :)
With GSoC we can help you to port kernels from caffe. /cc @edgarriba
Once libdnn package exporting is working we can start with that. On Jul 22, 2016 18:56, "bhack" notifications@github.com wrote:
With GSoC we can help you to port kernels from caffe. /cc @edgarriba https://github.com/edgarriba
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/naibaf7/libdnn/issues/7#issuecomment-234597243, or mute the thread https://github.com/notifications/unsubscribe-auth/AE6w66MvHliBeouwTP_GdjeZWqz-YMfMks5qYPYngaJpZM4JPVxW .
Question: possible to migrate your convnet-benchmarks PR to work with libdnn? Then I can probably just base off of that.
@hughperkins What do you mean by that? Do you need example usage code? The two go-tos here that work now are tiny-cnn and Caffe. https://github.com/edgarriba/tiny-cnn
The convnet-benchmark for libdnn is done through Caffe.
I looked through tiny-cnn, but couldnt quite figure out where to find the magic bits of code that do things like:
@hughperkins take a look here https://github.com/tiny-dnn/tiny-dnn/blob/master/tiny_dnn/core/kernels/conv2d_op_libdnn.h#L69
It's a minimal example that needs a further optimization when setting the batch size and transferring the cl_mem to next operations. But at least it works.
Ok, this compiles and runs, so:
Next up, create the buffers. I guess I need to create some kind of viennacl iterator-type objects?
oh maybe the buffers are just cl_mem
s:
kernel(WrapHandle((cl_mem)bottom_data, &ctx),
WrapHandle((cl_mem)weight, &ctx),
WrapHandle((cl_mem)bias, &ctx),
WrapHandle((cl_mem)top_data, &ctx)),
so, Dtype
should be ... if bottom_data
is both a const Dtype *
and a cl_mem
, then Dtype *
is cl_mem
, and Dtype
is ... ummm....well it's not a pointer to cl_mem, since that'd be the wrong way around, so ... ???? ... .ok, in cl.h, cl_mem
is:
typedef struct _cl_mem * cl_mem;
So, Dtype
is struct _cl_mem
?
See comments starting from https://github.com/BVLC/caffe/issues/4155#issuecomment-235441995
@hughperkins
You are right, buffers have to be created as cl_mem
, however LibDNN requires const Dtype*
which usually is float
. So, you need to cast a follows: cl_mem -> void* -> float*
https://github.com/tiny-dnn/tiny-dnn/blob/master/tiny_dnn/core/kernels/conv2d_op_libdnn.h#L114
https://github.com/tiny-dnn/tiny-dnn/blob/master/tiny_dnn/core/kernels/conv2d_op_libdnn.h#L191-L199
@edgarriba
Exactly right :) (cl_mem = cl_mem_* -> void* -> float*
)
Of course it's not a real float*
, but neither is CUDA when the flat unified address space is not guaranteed (depending on the version).
Still not the prettiest solution, but convenient for CUDA and OpenCL interoperability. As long as there is no "universal" standard for virtual pointers across the frameworks, this is the solution that will be used.
Yes as you have seen by the comments I've referenced it is not the most intuitive solution. But seems that there isn't emerged any alternative.
Fabian,
Can you fill in lines 71 onwards in my example code? I guess that calling convolutions in your library is relatively straightforward, for someone who knows the library well, so you could filll that in in a couple of minutes or so?
@hughperkins Sure, can you point me to the exact file I have to modify?
(so there are three cl_mem
s, one for input ,one for filters, one for ouptut, an opencl queue, an opencl device , an opencl context , and the geometry are in the vairables batchSize
, size
(means width
and height
, it's square), kernelSize
(again, square), planes
(assumes input and output planes/channels identical).)
(doesnt have to build/compile, I can fix any typos, but as long as it's close enough, and then I can handle any typos and so on; is much easier than actually figuring out exaclty which classes/m ethods to call etc :-) )
How to run a convolution forwardprop, input gradient, weights gradient?
Assume I have a cl_mem for input, weights, gradOutput, output, gradInput, gradWeights, an opencl queue, probably an opencl context. And metadata representing tensor dimensions. What to do next?