karpathy / char-rnn

Multi-layer Recurrent Neural Networks (LSTM, GRU, RNN) for character-level language models in Torch
11.59k stars 2.58k forks source link

openLC error #128

Closed zoinksbob closed 8 years ago

zoinksbob commented 8 years ago

Howdy! When I train with the CPU, everything works fine. But when I try to use OpenCL for my Radeon, I get the following:

$ th train.lua -opencl 1 using OpenCL on GPU 0... loading data files... cutting off end of data so that the batches/sequences divide evenly reshaping tensor... data load done. Number of data batches in train: 362, val: 20, test: 0 vocab size: 95 creating an lstm with 2 layers Using Mesa , OpenCL platform: Clover Using OpenCL device: AMD PITCAIRN (DRM 2.43.0, LLVM 3.7.0) setting forget gate biases to 1 in LSTM layer 1 setting forget gate biases to 1 in LSTM layer 2 number of parameters in the model: 259551 cloning rnn cloning criterion Apply_2t_0s0pt-22out = tanh( in1 ) build log: input.cl:35:12: warning: implicit declaration of function 'tanh' is invalid in C99 unsupported call to function tanh in THClTensor_pointwiseApplyD kernel build error:

kernel source: 1: // OpenCL kernels.... 2: 3: // expected templated values: 4: // dims (vector of unique dimension values) 5: // operation 6: // dim1 7: // dim2 8: // dim3 9: // ... dimD 10: // num_input_tensors 11: // include_scalar_input 12: // 13: // maybe should add: 14: // IndexType (hardcoded to int for now) 15: // MAX_CUTORCH_DIMS (hardcoded to 25 for now) 16: 17: // (Ported from cutorch's THCApply.cuh) 18: 19: // Maximum number of dimensions allowed for cutorch 20: // #define MAX_CUTORCH_DIMS 25 21: 22: // Enum that indicates whether tensor arguments are read/write or 23: // read-only 24: //enum TensorArgType { ReadWrite, ReadOnly }; 25: 26: 27: 28: inline void op( global float _out 29:
30: , global float in1 31:
32:
33:
34: ) { 35:
out = tanh( in1 ); 36: } 37: 38: kernel void 39: THClTensor_pointwiseApplyD( 40:
41: int offset_1, 42:
43:
44: global float_data_1, 45:
46: int offset_2, 47:
48:
49: int size_2_1, 50: int stride_2_1, 51:
52: int size_2_2, 53: int stride_2_2, 54:
55: global float
data_2, 56:
57:
58:
59: int totalElements) { 60: int linearIndex = get_global_id(0); 61: if(linearIndex < totalElements ) { 62:
63: int thisLinearId; 64:
65:
66:
67:
68: int derived_offset_1 = linearIndex + offset_1; 69:
70:
71:
72:
73: unsigned int derived_offset_2 = offset_2; 74: thisLinearId = linearIndex; 75: // bake this in.... 76: derived_offset_2 += (thisLinearId % size_2_2) * stride_2_2; 77:
78: thisLinearId /= size_2_2; 79:
80: // bake this in.... 81: derived_offset_2 += (thisLinearId % size_2_1) * stride_2_1; 82:
83: thisLinearId /= size_2_1; 84:
85:
86: 87:
88:
89: 90: op( 91:
92:
93: &(data_1[derived_offset_1]) 94:
95: , 96: &(data_2[derived_offset_2]) 97:
98: 99:
100: 101:
102: ); 103: } 104: }

Something went wrong with clCreateKernel, OpenCL erorr code -45 Apply_2t_0s0pt-22out = tanh( in1 ) build log: input.cl:35:12: warning: implicit declaration of function 'tanh' is invalid in C99 unsupported call to function tanh in THClTensor_pointwiseApplyD /home/randall/torch/install/bin/luajit: C++ exception

hughperkins commented 8 years ago

Can you confirm which versoin of opencl your card supports? ie, provide the output of clinfo?

zoinksbob commented 8 years ago

[randall@Ahmed-Linux ~]$ clinfo Number of platforms 1 Platform Name Clover Platform Vendor Mesa Platform Version OpenCL 1.1 MESA 11.0.4 Platform Profile FULL_PROFILE Platform Extensions cl_khr_icd Platform Extensions function suffix MESA

Platform Name Clover Number of devices 1 Device Name AMD PITCAIRN (DRM 2.43.0, LLVM 3.7.0) Device Vendor AMD Device Vendor ID 0x1002 Device Version OpenCL 1.1 MESA 11.0.4 Driver Version 11.0.4 Device OpenCL C Version OpenCL C 1.1 Device Type GPU Device Profile FULL_PROFILE Max compute units 20 Max clock frequency 1050MHz Max work item dimensions 3 Max work item sizes 256x256x256 Max work group size 256 Preferred work group size multiple 64 Preferred / native vector sizes
char 16 / 16
short 8 / 8
int 4 / 4
long 2 / 2
half 0 / 0 (n/a) float 4 / 4
double 2 / 2 (cl_khr_fp64) Half-precision Floating-point support (n/a) Single-precision Floating-point support (core) Denormals No Infinity and NANs Yes Round to nearest Yes Round to zero No Round to infinity No IEEE754-2008 fused multiply-add No Support is emulated in software No Correctly-rounded divide and sqrt operations No Double-precision Floating-point support (cl_khr_fp64) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No Address bits 32, Little-Endian Global memory size 1073741824 (1024MiB) Error Correction support No Max memory allocation 268435456 (256MiB) Unified memory for Host and Device Yes Minimum alignment for any data type 128 bytes Alignment of base address 1024 bits (128 bytes) Global Memory cache type None Image support No Local memory type Local Local memory size 32768 (32KiB) Max constant buffer size 268435456 (256MiB) Max number of constant args 16 Max size of kernel argument 1024 Queue properties
Out-of-order execution No Profiling Yes Profiling timer resolution 0ns Execution capabilities
Run OpenCL kernels Yes Run native kernels No Device Available Yes Compiler Available Yes Device Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_fp64

NULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) Clover clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [MESA] clCreateContext(NULL, ...) [default] Success [MESA] clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1) Platform Name Clover Device Name AMD PITCAIRN (DRM 2.43.0, LLVM 3.7.0) clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1) Platform Name Clover Device Name AMD PITCAIRN (DRM 2.43.0, LLVM 3.7.0)

ICD loader properties ICD loader Name OpenCL ICD Loader ICD loader Vendor OCL Icd free software ICD loader Version 2.2.7 ICD loader Profile OpenCL 1.2

hughperkins commented 8 years ago

Hmmm.... OpenCL 1.1... living right on the edge. As far as I can tell from your original stack-trace, the implementation of OpenCL you are using is missing the tanh function. I'm not 100% certain of this, unless I went off to go and check the exact implementation of OpenCL you are using, but the following lines in your output makes this seem likely:

Apply_2t_0s_0pt_-2_2_*out = tanh( *in1 ) build log:
input.cl:35:12: warning: implicit declaration of function 'tanh' is invalid in C99

If possible, I reckon the easiest way to solve this would be to upgrade your GPU drivers. Otherwise, I'm not sure what else can be done really. tanh is kind of vital for many things. You could try a different activation function I suppose, ie use sigmoid activations instead of tanh perhaps?

(Edited to add the phrase is missing in first sentence)

Edit2: didnt notice this is for char-rnn, rather than a generic clnn issue, but anyway, if you open the model/LSTM.lua file, you can see there are lines like local in_transform = nn.Tanh()(n4). You could try changing those Tanh to Sigmoid. I'm guessing it wont work terribly well, but it might run, and it might work ok-ish

hughperkins commented 8 years ago

Hey! You know, one option would be to simply hack clnn a bit. git clone the clnn repository (https://github.com/hughperkins/clnn), and then in Tanh.lua, change line 3 to use exp instead. You can make use of the expression:

tanh(x) = (e^x - e^(-x)) / (e^x + e^(-x))

Edit: the way to do this would probably be to use apply, or maybe map, like self.output:map(input, '*out = [something here...]")

Edit2: documentaiton on apply/map/map2 at https://github.com/hughperkins/cltorch#applymapmap2

Edit3: maybe something like: self.output:map(input, 'x = (exp(y) - exp(-y)) / (exp(y) + exp(-y)))

hughperkins commented 8 years ago

Update: seems like Clover is purely in software, I mean, it's basically x86 code, not running in OpenCL, in hardware, as far as I can tell?

hughperkins commented 8 years ago

(and so I reckon you need to get the proprietary Radeon drivers from AMD)

hughperkins commented 8 years ago

I think this issue should probably be closed now?

zoinksbob commented 8 years ago

Yes, you can close the issue. Thanks very much for looking at it. I appreciate your responses. It looks like I just need to run on the CPU.

hughperkins commented 8 years ago

Hi, apparently I'm wrong. Clover does actually run on the GPU :-) See https://github.com/Element-Research/rnn/issues/41