hughperkins / clnn

OpenCL backend for Torch nn neural networks library
BSD 2-Clause "Simplified" License
126 stars 16 forks source link

clnn with mac Intel HD4000 #16

Closed mikeconnors909 closed 8 years ago

mikeconnors909 commented 8 years ago

Hi, I am trying to run this code (https://github.com/karpathy/char-rnn), which has support for OpenCl. Normally mac has an OpenCl preinstalled. However, when I try to run the code with the opencl option, it says to install the clnn and cltorch modules (they are installed) and if they are installed, to check my OpenCL driver's configuration. This is on a macbook pro with an Intel HD4000 card running OS X Mavericks. Any idea what is happening? How do I fix?

hughperkins commented 8 years ago
  1. Ok, looks like libOpenCL.so is not being picked up perhaps. Can you do:
clinfo

Also, can you locate a file called 'libOpenCL.so' on your system somewhere, and provide the results of listing it, eg something like:

ls /usr/lib/libOpenCL.so

or perhaps something like:

ls /usr/lib/x86_64-linux-gnu/libOpenCL.so

(obviously the second path is linux-specific, but just an example that the lib might not be directly in /usr/lib)

Actually, on linux, it seems useful to do ldd $(which clinfo):

$ ldd $(which clinfo)
    linux-vdso.so.1 =>  (0x00007fff91526000)
    libOpenCL.so.1 => /usr/lib/x86_64-linux-gnu/libOpenCL.so.1 (0x00007f77e7a4f000)
    libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f77e774b000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f77e7534000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f77e716f000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f77e6f6b000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f77e6d4c000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f77e6a46000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f77e7c7b000)

... which shows where it is finding libOpenCL.so. Except that, on my system it is linked to libOpenCL.so.1, rather than libOpenCL.so, but the directory is correct.

  1. On another tack, might be good to find some more details about the errors. Eg, what happens if you run the cltorch unit tests, ie luajit -l cltorch -e 'cltorch.test()'?
mikeconnors909 commented 8 years ago
  1. clinfo outputs -bash: clinfo: command not found
  2. a sudo find / -name libOpenCL.so command finds nothing, I checked the two directories you listed just to make sure, and they were not there.
  3. luajit -l cltorch -e 'cltorch.test()'has the following output:
stack traceback:
    [C]: in function 'require'
    ...Riskin-Kutz/torch/install/share/lua/5.1/cltorch/init.lua:19: in main chunk
    [C]: at 0x010eb39300
    [C]: at 0x010eb082c0

Anything I should do going forward?

hughperkins commented 8 years ago

Hmmm, can you try sudo find / -name 'libOpenCL.so*'?

I suspect you have libOpenCL.so.1 and not libOpenCL.so, in which case there are a couple of fixes for that.

mikeconnors909 commented 8 years ago

That outputs nothing as well. Is it possible the library doesn't exist on my computer? Is this software tested with macs?

hughperkins commented 8 years ago

Is this software tested with macs?

Yes, you can see there are some other guys with HD4000, and it is working. For example:

https://github.com/hughperkins/cltorch/issues/8

You can see in this thread, in the output:

Using Apple platform: Apple
Using device: HD Graphics 4000

However, I dont have a Mac. I run Ubuntu 14.04. So, I cant directly test on Macs, and I have no insight into how they work. I can't try things, or experiment...

Is it possible the library doesn't exist on my computer?

It seems likely that it doesnt, since the find is not locating it. Unless it is a .dylib instead of a .so perhaps? There is a tool called clinfo in the AMD SDK. Might be worth installing it and trying that, but that's starting to get waayyy out of things I've done myself.

However, here are some links about installing OpenCL on Mac. Can you take a look at these, and see if these throw up any useful ideas please? :

hughperkins commented 8 years ago

Hi Mike,

It's been a while, but just noticed this issue is still open. Actually, I remember, on some systems, there is a libOpenCL.so.1, but no libOpenCL.so. Actually, I updated clew so that it will work with either now. If you get a moment, do you mind reinstalling cltorch, and seeing if the problem is solved now?

data-ux commented 8 years ago

I'm having the same problem getting char-rnn running on Mac OS 10.11.2 (El Capitan). The machine is a MacPro with ATI Radeon HD5870. luajit -l cltorch -e 'cltorch.test()' runs without errors.

luajit -l clnn -e 'clnn.test()' runs some of the tests, but then produces:

Abs_backward
 Function call failed 
...es/Data/Users/jan/torch/install/share/lua/5.1/nn/Abs.lua:8: attempt to index field 'THNN' (a nil value)
stack traceback:
    ...es/Data/Users/jan/torch/install/share/lua/5.1/nn/Abs.lua:8: in function 'forward'
    ...Data/Users/jan/torch/install/share/lua/5.1/clnn/test.lua:267: in function 'v'
    ...Data/Users/jan/torch/install/share/lua/5.1/clnn/test.lua:2616: in function <...Data/Users/jan/torch/install/share/lua/5.1/clnn/test.lua:2614>
    [C]: in function 'xpcall'
    ...a/Users/jan/torch/install/share/lua/5.1/torch/Tester.lua:115: in function 'pcall'
    ...a/Users/jan/torch/install/share/lua/5.1/torch/Tester.lua:186: in function '_run'
    ...a/Users/jan/torch/install/share/lua/5.1/torch/Tester.lua:161: in function 'run'
    ...Data/Users/jan/torch/install/share/lua/5.1/clnn/test.lua:2655: in function 'test'
    (command line):1: in main chunk
    [C]: at 0x010cd57ba0
hughperkins commented 8 years ago

You are right. I get the same issue now. Something ot do with https://github.com/torch/nn/commit/ad1efeed343a6f82593a15e78ea7e7bd6ceb041c I think. Though how and why is an open question.

hughperkins commented 8 years ago

Seems we probably have to merge/port https://github.com/torch/cunn/blob/master/THCUNN.lua#L57 into clnn somehow.

hughperkins commented 8 years ago

Created new issue for the Abs unit-test issue https://github.com/hughperkins/clnn/issues/21

hughperkins commented 8 years ago

(Note that Abs issues seems solved now; so please pull down latest clnn, and retry)

data-ux commented 8 years ago

Thanks for your efforts!

luajit -l clnn -e 'clnn.test()' now runs all the tests without errors.

require 'clnn' returns true, but with some 'symbol not found's:

th> require 'clnn'
libthclnn_searchpath    /Volumes/Data/Users/jan/torch/install/lib/lua/5.1/libTHCLNN.so  
not found: THNN_ClAbsCriterion_updateOutput...s/Data/Users/jan/torch/install/share/lua/5.1/nn/THNN.lua:109: dlsym(0x7f9569604750, THNN_ClAbsCriterion_updateOutput): symbol not found   
not found: THNN_ClAbsCriterion_updateGradInput...s/Data/Users/jan/torch/install/share/lua/5.1/nn/THNN.lua:109: dlsym(0x7f9569604750, THNN_ClAbsCriterion_updateGradInput): symbol not found
true
hughperkins commented 8 years ago

Yes. As far as I know, that message is harmless (unless you are using AbsCriterion?), but I should probably get rid of the message somehow.

hughperkins commented 8 years ago

I think it should be fixed in https://github.com/hughperkins/clnn/commit/b2a81ed935b28140cfb2ac39e8cb28d1a48f5e5a now?

data-ux commented 8 years ago

Yes, it seems the message was harmless. I got char-rnn running ok even when it was still being generated. Now with the latest version, the message is gone.

Thanks again.

hughperkins commented 8 years ago

Ok, cool :-) I think I shall close this issue now, and any new problems can go into a new issue :-)