hughperkins / DeepCL

OpenCL library to train deep convolutional neural networks
Mozilla Public License 2.0
865 stars 199 forks source link

OpenCL build error on Activation kernel #57

Closed maged closed 8 years ago

maged commented 8 years ago

I'm getting an OpenCL kernel build error when compiling activate.cl. I'm using PyDeepCL's NetdefToNet.createNetFromNetdef with the architecture: rt2-8c5z-relu-mp2-16c5z-relu-mp3-150n-tanh-7n.

Using NVIDIA Corporation , OpenCL platform: NVIDIA CUDA
Using OpenCL device: GeForce 940M
initializing clblas
cl/activate.cl build log: 
<built-in>:13:9: error: macro names must be identifiers
#define <C8><U+000F><EB><U+0003> 1
        ^
<built-in>:23:9: error: macro names must be identifiers
#define <C8><U+000F><EB><U+0003> 1
        ^

kernel build error:

kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6: 
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9: 
10: #ifdef TANH
11:     #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13:     #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15:     #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17:     #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19:     #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21:     #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23: 
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26:     const int globalId = get_global_id(0);
27:     if (globalId >= N) {
28:         return;
29:     }
30:     inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33: 
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36:     const int globalId = get_global_id(0);
37:     if (globalId >= N) {
38:         return;
39:     }
40:     out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43: 
44: 

Debugging this, it looks like it's caused by a broken options string ("" -DgOutputSize=32 -DgOutputSizeSquared=1024 -DgInputSize=32 -DgInputSizeSquared=1024 -DgNumPlanes=8 -D \230zt\002"), which is in turn caused by an ActivationFunction that has been optimized out (according to gdb).

hughperkins commented 8 years ago

Hi. You have the same GPU as me, so thats convenient for testing :-) Question: can you provide a minimal python script to reprorduce the problem please? are you using python2.7 or python 3.something?

hughperkins commented 8 years ago

(Like, for example, if I try:

from __future__ import print_function
import array
import PyDeepCL
import sys
print('imports done')

if len(sys.argv) != 2:
    print(
        'usage: python ' + sys.argv[0] +
        ' [mnist data directory (containing the .mat files)]')
    sys.exit(-1)

mnistFilePath = sys.argv[1] + '/t10k-images-idx3-ubyte'

cl = PyDeepCL.DeepCL()

print('compute units:', cl.getComputeUnits())
print('local memory size, bytes:', cl.getLocalMemorySize())
print('local memory size, KB:', cl.getLocalMemorySizeKB())
print('max workgroup size:', cl.getMaxWorkgroupSize())
print('max alloc size MB:', cl.getMaxAllocSizeMB())

net = PyDeepCL.NeuralNet(cl, 1, 28)
print('created net')
print(net.asString())
print('printed net')
net.addLayer(PyDeepCL.NormalizationLayerMaker().translate(-0.5).scale(1/255.0))
print('added layer ')
PyDeepCL.NetdefToNet.createNetFromNetdef(
    net, "rt2-8c5z-relu-mp2-16c5z-relu-mp3-150n-tanh-10n")
print(net.asString())

(N, planes, size) = PyDeepCL.GenericLoader.getDimensions(mnistFilePath)
print((N, planes, size))

N = 1280
images = array.array('f', [0] * (N * planes * size * size))
labels = array.array('i', [0] * N)
PyDeepCL.GenericLoader.load(mnistFilePath, images, labels, 0, N)
print('loaded data')

sgd = PyDeepCL.SGD(cl, 0.002, 0.0)
print('created SGD')
sgd.setWeightDecay(0.0001)
netLearner = PyDeepCL.NetLearner(
    sgd, net,
    N, images, labels,
    N, images, labels,
    128)
print('created netLearner')
netLearner.setSchedule(12)
netLearner.run()
print('done, cleaning up...')

... this works ok for me. but coulld be a difference in our compilation options perhaps)

hughperkins commented 8 years ago

It sounds like a problem I had, that was fixed actually, which I fixed by converting from string to const char * for ActivationFunction::getDefineName(). can you confirm the git log pleaes?

git log -n 5 --oneline

On my box I get something like:

d45b6de remove include_dirs
573deb7 remove mac build warning for softmaxlayer
f4fad72 remove build warning on mac
e3fb8c7 remove warning on Mac
8dff0d1 remove mac os x build warnings
maged commented 8 years ago

Here's my git log:

573deb7 remove mac build warning for softmaxlayer
f4fad72 remove build warning on mac
e3fb8c7 remove warning on Mac
8dff0d1 remove mac os x build warnings
309c543 change directory in travis

And here's a script it will break on:

cl = PyDeepCL.DeepCL()

net = PyDeepCL.NeuralNet(cl, 1, 28)
net.addLayer(PyDeepCL.NormalizationLayerMaker().translate(-0.5).scale(1/255.0))
PyDeepCL.NetdefToNet.createNetFromNetdef(
    net, "rt2-8c5z-tanh-5n")

With relu causing the same error as tanh above. I compiled with cmake -DCMAKE_BUILD_TYPE=Debug .., to force fn to not be optimized out, and it looks like the root issue is a seg fault on any fn->foo() call. Although my repo is up to date, it might be another part of the library I broke (I've recently been playing around a lot with the DeepCL/EasyCL library). I'll try starting with a new install of DeepCL and see if that resolves it.

Thanks for your fast reply!

maged commented 8 years ago

I'm getting the same error on a completely new install up to date with the master

d45b6de remove include_dirs
573deb7 remove mac build warning for softmaxlayer
f4fad72 remove build warning on mac
e3fb8c7 remove warning on Mac
8dff0d1 remove mac os x build warnings
hughperkins commented 8 years ago

Ok, interesting. Please can you confirm:

(Update: the test code above runs ok for me. So I guess it is related to some difference in our OS, or python version)

hughperkins commented 8 years ago

(Note: I added your test to my Mac OS X travis script ,and seems to work ok: https://travis-ci.org/hughperkins/DeepCL/builds/115800365#L2384 https://github.com/hughperkins/DeepCL/blob/travis-tweaks/travis/simpletest.py#L37 So, seems to work on Ubuntu 15.10, and Mac OS X. I guess that leaves ... Windows?)

hughperkins commented 8 years ago

(updated python wrappers slightly to implement __str_ https://travis-ci.org/hughperkins/DeepCL/builds/115806497#L2395 , to give more convincing output )

hughperkins commented 8 years ago

(added test for python2.7 too https://travis-ci.org/hughperkins/DeepCL/builds/115809055#L2413 https://travis-ci.org/hughperkins/DeepCL/builds/115809055#L2540 https://github.com/hughperkins/DeepCL/blob/travis-tweaks/.travis.yml#L65-L69 )

maged commented 8 years ago

I'm actually using Ubuntu 15.10 with Python 2.7.11 :: Anaconda 2.5.0 (64-bit). Not sure why this only appears on my system... I might've broken something arcane.

hughperkins commented 8 years ago

Hmmm, I'm using Ubuntu 15.10 too...

hughperkins commented 8 years ago

How about:

pip uninstall -y DeepCL

... and then do a fresh git clone of DeepCL, and rebuild/reinstall?

maged commented 8 years ago

Worked :) Thanks for your help debugging (and building the library!)

hughperkins commented 8 years ago

Cool :-)