Table of Contents generated with DocToc
Easy to run kernels using OpenCL. (renamed from OpenCLHelper)
Imagine we have a kernel with the following signature, in the file /tmp/foo.cl:
kernel void my_kernel( int N, global float *one, global float *two, local float *one_local, global float *result ) {
// kernel code here...
}
... then we can call it like:
#include "EasyCL.h"
if( !EasyCL::isOpenCLAvailable() ) {
cout << "opencl library not found" << endl;
exit(-1);
}
EasyCL *cl = EasyCL::createForFirstGpu();
CLKernel *kernel = cl->buildKernel("somekernelfile.cl", "test_function");
int in[5];
int out[5];
for( int i = 0; i < 5; i++ ) {
in[i] = i * 3;
}
kernel->in( 5, in );
kernel->out( 5, out );
kernel->run_1d( 5, 5 ); // global workgroup size = 5, local workgroup size = 5
delete kernel;
// use the results in 'out' array here
More generally, you can call on 2d and 3d workgroups by using the kernel->run
method:
const size_t local_ws[1]; local_ws[0] = 512;
const size_t global_ws[1]; global_ws[0] = EasyCL::roundUp(local_ws[0], size);
kernel->run( 1, global_ws, local_ws ); // 1 is number of dimensions, could be 2, or 3
'Fluent' style is also possible, eg:
kernel->in(10)->in(5)->out( 5, outarray )->run_1d( 5, 5 );
If you use EasyCL::createForFirstGpu()
, EasyCL will bind to the first OpenCL-enabled GPU (or accelerator), that it finds. If you want to use a different device, or an OpenCL-enabled CPU, you can use one of the following method:
EasyCL::createForIndexedGpu( int gpuindex ); // looks for opencl-enabled gpus, and binds to the (gpuindex+1)th one
EasyCL::createForFirstGpuOtherwiseCpu();
EasyCL::createForPlatformDeviceIndexes( int platformIndex, int deviceIndex );
EasyCL::createForPlatformDeviceIds( int platformId, int deviceId ); // you can get these ids by running `gpuinfo` first
You can run gpuinfo
to get a list of platforms and devices on your system.
There are some examples in the test subdirectory.
You can use the environment variable CL_GPUOFFSET
to choose a GPU. It shifts the gpu numbering downwards by this offset, ie gpu index 1 becomes 0, index 2 becomes 1. For example, if a program uses gpu index 0 by default, setting CL_GPUOFFSET
to 1
will choose the second gpu, and setting it to 2
will choose the third gpu.
There are some examples in the test subdirectory.
testfloatwrapper, main )
of testfloatwrapper.cpp// constructor:
EasyCL::EasyCL();
// choose different gpu index
void EasyCL::gpu( int gpuindex );
// compile kernel
CLKernel *EasyCL::buildKernel( string kernelfilepath, string kernelname, string options = "" );
// Note that you pass `#define`s in through the `options` parameters, like `-D TANH`, or `-D TANH -D BIASED`
// passing arguments to kernel:
CLKernel::in( int integerinput );
CLKernel::in( int arraysize, const float *inputarray ); // size in number of floats
CLKernel::in( int arraysize, const int *inputarray ); // size in number of ints
CLKernel::out( int arraysize, float *outputarray ); // size in number of floats
CLKernel::out( int arraysize, int *outputarray ); // size in number of ints
CLKernel::inout( int arraysize, float *inoutarray ); // size in number of floats
CLKernel::inout( int arraysize, int *inoutarray ); // size in number of ints
// to allocate local arrays, as passed-in kernel parameters:
CLKernel::localFloats( int localarraysize ); // size in number of floats
CLKernel::localInts( int localarraysize ); // size in number of ints
// running kernel, getting result back, and cleaning up:
CLKernel::run_1d( int global_ws, int local_ws );
CLKernel::run( int number_dimensions, size_t *global_ws, size_t *local_ws );
// helper function:
EasyCL::roundUp( int quantizationSize, int desiredTotalSize );
To make it possible to reuse data between kernels, without moving back to PC main memory, and back onto the GPU, you can use CLWrapper objects.
These can be created on the GPU, or on the host, and moved backwards and forwards between each other, as required. They can be passed as an 'input' and 'output' to a CLKernel object. They can be reused between kernels.
There are two 'flavors':
copyToDevice()
and
copyToHost()
yourselfCLArray objects are the first implementation. CLWrapper objects are the second implementation. You can use either, but note that CLWrapper objects are the ones that I use myself.
Compared to CLArray objects, CLWrapper objects need less memory copying,
since they wrap an existing native array, but you will need to call copyToDevice()
and copyToHost()
yourself.
if( !EasyCL::isOpenCLAvailable() ) {
cout << "opencl library not found" << endl;
exit(-1);
}
cout << "found opencl library" << endl;
EasyCL cl;
CLKernel *kernel = cl.buildKernel("../test/testeasycl.cl", "test_int");
int in[5];
for( int i = 0; i < 5; i++ ) {
in[i] = i * 3;
}
int out[5];
CLWrapper *inwrapper = cl.wrap(5, in);
CLWrapper *outwrapper = cl.wrap(5, out);
inwrapper->copyToDevice();
kernel->in( inwrapper );
kernel->out( outwrapper );
kernel->run_1d( 5, 5 );
outwrapper->copyToHost();
assertEquals( out[0] , 7 );
assertEquals( out[1] , 10 );
assertEquals( out[2] , 13 );
assertEquals( out[3] , 16 );
assertEquals( out[4] , 19 );
cout << "tests completed ok" << endl;
Can copy between buffers (New!):
wrapper1->copyTo( wrapper2 );
CLWrapper objects are currently available as CLIntWrapper
and CLFloatWrapper
.
Compared to CLWrapper objects, CLArray objects are more automated, but involve more memory copying.
EasyCL cl;
CLArrayFloat *one = cl.arrayFloat(10000); // create CLArray object for 10,000 floats
(*one)[0] = 5; // give some data...
(*one)[1] = 7;
CLArrayFloat *two = cl.arrayFloat(10000);
// pass to kernel:
kernel->in(one)->out(two);
You can then take the 'two' CLArray object, and pass it as the 'input' to a different kernel, or you can use operator[] to read values from it.
Currently, CLArray is available as 'CLArrayFloat' and 'CLArrayInt'.
You can store kernels in the store, under a unique name each, to facilitate kernel caching
// store:
cl->storeKernel( "mykernelname", somekernel ); // name must be not used yet
// check exists:
cl->kernelExists( "mykernelname" );
// retrieve:
CLKernel *kernel = cl->getKernel( "mykernelname" );
New: you can transfer kernel ownership to EasyCL object, by passing third parameter deleteWithCl = true
. Then, when the EasyCL object is deleted, so will be the kernel.
// store:
cl->storeKernel( "mykernelname", somekernel, true ); // this kernel will be deleted when
// `cl` object is deleted
For CLWrapper objects, if the wrapper is passed to a kernel via out
or inout
, and then that kernel is run, then isDeviceDirty()
will return true, until ->copyToHost()
is called. So, you can use this to determine whether you need to run ->copyToHost()
prior to reading the host-side array.
The following methods will reset the flag to false
:
copyToDevice()
copyToHost()
This is a new feature, as of May 15 2015, and might have some bugs prior to May 31 2015 (ie, about 2 weeks, long enough for me to find any bugs).
{{some_variable_name}}
{% for i=0,5 do %}... code here ... {% end %}
#include
new "CLKernel_structs.h"
header, in order to be able to pass structscl->setProfiling(true);
, then run your kernels as normal, then call cl->dumpProfiling
to print the results->getBuffer()
on a CLWrapper object, in order to pass it to clBLAS. You can see an example eg at THClBlas.cpp#L425Option | Description |
---|---|
PROVIDE_LUA_ENGINE| If you want to call EasyCL from within Lua, then choose option PROVIDE_LUA_ENGINE=OFF, otherwise leave it as ON` |
|
DEV_RUN_COG |
Only for EasyCL maintainers, leave as OFF otherwise |
BUILD_TESTS |
whether to build unit tests |
(tested on Travis https://travis-ci.org/hughperkins/EasyCL )
git clone --recursive https://github.com/hughperkins/EasyCL.git
cd EasyCL
mkdir build
cd build
cmake ..
make install
../dist/bin
folder, and the .dylib files in ../dist/lib
--recursive
, otherwise you will see odd errors about clew/src/clew.c
missing
git submodule init
and then git submodule update
.git clone --recursive https://github.com/hughperkins/EasyCL.git
cd EasyCL
mkdir build
cd build
cmake ..
make install
../dist/bin
folder, and the .so files in ../dist/lib
--recursive
, otherwise you will see odd errors about clew/src/clew.c
missing
git submodule init
and then git submodule update
.git clone --recursive https://github.com/hughperkins/EasyCL.git
build-win32
, or build-win64
, according to which platform you are building forconfigure
, choose appropriate build platform, eg visual studio 2013, or visual studio 2013 win64generate
build-win32
or build-win64
build directoryDebug
to Release
build
menu, choose build solution
test
directory into the directory where you will run the tests from (if you can figure out a way to automate this, please send a pull request :-) )To check clew library is working ok (ie finding and loading the opencl library, etc):
linux:
LD_LIBRARY_PATH=../dist/lib ..dist/bin/gpuinfo
Windows:
..dist/bin/gpuinfo
... should print some information about your graphics card
Unit-tests:
Linux:
LD_LIBRARY_PATH=../dist/lib ..dist/bin/easycl_unittests
Windows:
..dist/bin/easycl_unittests
clinfo
(install via sudo apt-get install clinfo
), to check the
OpenCL installation itself is ok. If this says 'no installations found', then it's an OpenCL
configuration issue.
clinfo
is broken on CUDA, I think? But OpenCL will still work ok: try gpuinfo
insteadgpuinfo
to list available platforms and deviceskernel->run
a bit fasterCL_GPUOFFSET
, which lets you choose a GPU, by setting this var to 1,2,3, ...CLQueue
, containing EasyCL::queue
cl_command_queue
easycl
now. Since it's a breaking change, in terms of compatibility, I've bumped the major version numberint64_t
and uint64_t
, instead of long long
and unsigned long long
. This is configurable in cmake options, though the default is that the typedef changes. I'm not 100% sure if changing the default is a good idea, but it seems better than having int64
and int64_t
be two different types...git submodule init
and git submodulate update
to download it
-f
option to git, or delete the thirdparty/clew directory first, srcOffset, dstOffset, count
CLWrapper->copyTo()
methodCLKernel
sout
or inout
, and that kernel is runstoreKernel( string name, CLKernel *kernel )
, getKernel( string name )
, kernelExists( string name )
, to facilitate per-connection kernel cachinggetCl()
to CLWrapper
typesEasyCL is available under MPL v2 license, http://mozilla.org/MPL/2.0/19