DeepCL unit tests failing on vivante gc2000

Mezzano commented 8 years ago

Hello,

we're trying to activate and use DeepCL on a GPU vivante gc200 on imx6q board ARM 7 with linux 3.10.31 and distribution opensuse 13.1.

The GPU is opencl 1.1 EP enabled.

DeepCL is correctly built but when we try deepcl_unittests we face the following problem:

dlopen failed: /usr/lib/libOpenCL.so: undefined symbol: gcoCL_DestroyTexture Couldnt find OpenCL-enabled GPU: OpenCL library not found

With command nm, we see that libOpenCL.so has undefined symbols for gc objects.

Can you help?

NKUCodingCat commented 8 years ago

Have you run source dist/bin/activate.sh before run the unittest? According to my experience on windows, the path setting is just for current cmd window. Hope it will help.

Mezzano commented 8 years ago

Yes of course.

NKUCodingCat commented 8 years ago

As I search for libOpenCL.so on Google, it seems that this lib should be installed by yourself on distros like ubuntu or debian, so maybe it is not a binary included in DeepCL but provided platform-specific. Here are the guide of install OpenCL in opensuse 12, hope it helps. http://zpass.logdown.com/posts/79030-setting-up-opencl-on-opensuse-123

hughperkins commented 8 years ago

Googling around for gcoCL_DestroyTexture got me to https://community.nxp.com/thread/304961 Looks like functions prefixed gcoCL are specific to Vivantes devices. A hint in this thread mentions -lGAL. Googling for "-lGAL" turns up: https://blog.visucore.com/2013/3/12/opencl-on-i-mx6 , which says:

"At least on my Freescale rootfs, libOpenCL needs functions from libGAL but is not linked against it. This means that -lGAL has to be added to the linker command line along with -lOpenCL."

Looks like you need to ad -lGAL ... hmmm ... somewhere :-P ... Hmmm ... that sounds challenging. Maybe you can use LD_PRELOAD? Youll need to first find a file with a name like libGAL.so. Lets say its in /usr/lib, then you'll do something like:

LD_PRELOAD=/usr/lib/libGAL.so ./deepcl_unittests

Can you see if you can find libGAL.so, and see if a commandline something like this helps somewhat?

Mezzano commented 8 years ago

Hi Hugh,

thanks for your input, we had indeed tried to link with -GAL by adding it in the CMakeLists.txt file but with no successfull result.

I just tried the LD_PRELOAD trick and its not better as you can see the unit tests output:

https://gist.github.com/Mezzano/4c6933f15969124775998bf4338abe0a

PS: We've added a debug line in the clew.c file as you can see

hughperkins commented 8 years ago

Well... can we start with getting clinfo to run? Can you confirm that clinfo does/doesnt run?

Mezzano commented 8 years ago

./clinfo does run :)

https://gist.github.com/Mezzano/d3823eea0acfdbcb7b5ea7d90fccd2b7

We think that our GPU is limited because of opencl 1.1 EP.

Mezzano commented 8 years ago

@NKUCodingCat , thanks for your link but we are on an armv7l architecture.

hughperkins commented 8 years ago

Ok. Maybe we can start with something really simple/basic. Can you paste the following into a file simple.cpp. its just something i found lying around on my hard drive, that happens to run a simple opencl kernel:

#include <iostream>
#include <sstream>
#include <stdexcept>
using namespace std;

#include "CL/cl.hpp"

template<typename T>
std::string toString(T val ) { // not terribly efficient, but works...
   std::ostringstream myostringstream;
   myostringstream << val;
   return myostringstream.str();
}

void checkError( cl_int error ) {
    if (error != CL_SUCCESS) {
       throw std::runtime_error( "Error: " + toString(error) );
    }
}

int main( int argc, char *argv[] ) {

     cl_int error;  

    cl_device_id *device_ids;

    cl_uint num_platforms;
    cl_uint num_devices;

    cl_platform_id platform_id;
    cl_device_id device;

    cl_context context;
    cl_command_queue queue;
    cl_program program;

    checkError( clGetPlatformIDs(1, &platform_id, &num_platforms) );
    checkError(  clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_GPU, 1, &device, &num_devices) );
    device_ids = new cl_device_id[num_devices];
    checkError( clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_GPU, num_devices, device_ids, &num_devices) );
    device = device_ids[0];
    context = clCreateContext(0, 1, &device, NULL, NULL, &error);
    checkError(error);
    queue = clCreateCommandQueue(context, device, 0, &error);
    checkError(error);

    string kernel_source = string( "kernel void test_read( const int one,  const int two, global int *out) {\n" ) +
    "    const int globalid = get_global_id(0);\n" +
    "    int sum = 0;\n" +
    "    int n = 0;\n" +
    "    while( n < 100000 ) {\n" +
    "        sum = (sum + one ) % 1357 * two;\n" +
    "        n++;\n" +
    "    }\n" +
    "    out[globalid] = sum;\n" +
    "}\n";
    const char *source_char = kernel_source.c_str();
    size_t src_size = strlen( source_char );
    program = clCreateProgramWithSource(context, 1, &source_char, &src_size, &error);
    checkError(error);

    checkError( clBuildProgram(program, 1, &device, 0, NULL, NULL) );

    cl_kernel kernel = clCreateKernel(program, "test_read", &error);
    checkError(error);

    const int N = 4500;
    int *out = new int[N];
    if( out == 0 ) throw runtime_error("couldnt allocate array");

    int c1 = 3;
    int c2 = 7;
    checkError( clSetKernelArg(kernel, 0, sizeof(int), &c1 ) );
    checkError( clSetKernelArg(kernel, 1, sizeof(int), &c2 ) );
    cl_mem outbuffer = clCreateBuffer(context, CL_MEM_WRITE_ONLY, sizeof(int) * N, 0, &error);
    checkError(error);
    checkError( clSetKernelArg(kernel, 2, sizeof(cl_mem), &outbuffer) );

    size_t globalSize = N;
    size_t workgroupsize = 512;
    globalSize = ( ( globalSize + workgroupsize - 1 ) / workgroupsize ) * workgroupsize;
    checkError( clEnqueueNDRangeKernel( queue, kernel, 1, NULL, &globalSize, &workgroupsize, 0, NULL, NULL) );
    checkError( clFinish( queue ) );
    checkError( clEnqueueReadBuffer( queue, outbuffer, CL_TRUE, 0, sizeof(int) * N, out, 0, NULL, NULL) );    
    checkError( clFinish( queue ) );

    for( int i = 0; i < N; i++ ) {
       if( out[i] != 4228 ) {
           cout << "out[" << i << "] != 4228: " << out[i] << endl;
           exit(-1);
       }
    }

    return 0;
}

Then run something like:

g++ -o simple simple.cpp -lOpenCL
./simple

On my box:

ubuntu@peach:/tmp$ g++ -o simple simple.cpp -lOpenCL
ubuntu@peach:/tmp$ ./simple 
ubuntu@peach:/tmp$

Mezzano commented 7 years ago

Hi Hugues,

thanks again for taking time to help us, I tried your code and compiled with g++ 4.8.5. As you can see, I needed to add cstlib and cstio headers. And afterwards link with -lGAL. I finally get a runtime error:

https://gist.github.com/Mezzano/58e576950aff0ae4590aabc4cbef9756

Mezzano commented 7 years ago

The error seems to be related to line 20 of the code:

checkError( clEnqueueNDRangeKernel( queue, kernel, 1, NULL, &globalSize, &workgroupsize, 0, NULL, NULL) );

As if one argument wasn't correctly expected by the function.

Mezzano commented 7 years ago

The following code works:

https://gist.github.com/Mezzano/cf4dc8caf7f8ff9b148d51c4ea49be54

Mezzano commented 7 years ago

In fact, error -54 is related to workgroup size, by changing it to 256, it works, 512 seems to be too big.

hughperkins commented 7 years ago

Ok. Thats a good start. So, then, where next? Seems this line fails for you:

g++ -o simple simple.cpp -lOpenCL

with error about undefined reference to gcKERNEL_FUNCTION_GetName. Thats unusual, since normally libOpenCL.so is a generic library, that doesnt link to the driver itself. Am I right in thinking this libOpenCL.so is a proprietary libOpenCL.so, supplied by vivantes?

Mezzano commented 7 years ago

g++ -o simple simple.cpp -lOpenCL needs -lGAL to compile. The libOpenCL.so is indeed supplied by vivante, we got it here for our kernel version 3.10.17: http://repository.timesys.com/buildsources/g/gpu-viv-bin-mx6q/

Mezzano commented 7 years ago

We've asked Khronos to supply us with an opencl 1.1 EP compatible library but no reaction from them: https://github.com/KhronosGroup/OpenCL-ICD-Loader/issues/7

hughperkins commented 7 years ago

Ok. I dont see Vivantes in the list of khronos members? https://www.khronos.org/members/

But I see no reason why we cant get this running using the proprietary libOpenCL.so file.

I'm kind of busy at work this week though. I do have a full-time job :-P

I think your next mission, if you choose to accept it, would be to look at getting https://github.com/hughperkins/EasyCL working on vivantes. It's a middleware library used in DeepCL, to set up the GPUs, build kernels and so on. It's a lot simpler and easier to compile than DeepCL, whilst at the same time if you can get it running against the vivantes libOpenCL.so, then DeepCL will probalby work too.

I would probalby look at simply adding -lGAL into the CMakeLists.txt file, ie in a target_link_libraries, something like that. If that works, I can figure out an appropriat eoption to add toe mainstream EasyCL etc, so tah this works ok.

hughperkins commented 7 years ago

(Hmmm, you can probably mess around with the option USE_CLEW of easycl a bit too actually. clew is an abstraction layer, that means that the program will build and run without opencl being present. Or rather, it will build without opencl being present. But in the case of easycl and deepcl, it wont get very far at runtime without opencl being present actually... )

Mezzano commented 7 years ago

Thanks for your input but our version of libOpenCL.so seems to be the problem. We've contacted vivantecorp and hope to have some support from them.

hughperkins / DeepCL

DeepCL unit tests failing on vivante gc2000 #82