hughperkins / cltorch

An OpenCL backend for torch.
Other
289 stars 26 forks source link

OS X: luajit -l cltorch = WIN! -- th -l cltorch = FAIL (with solution) #21

Closed spdustin closed 8 years ago

spdustin commented 8 years ago

The ~/torch/install/bin/thshell script to open the torch REPL doesn't load the calling shell's environment on OS X. For me, this resulted in:

$ Sirius:cltorch dustin$ th -l cltorch -e 'cltorch.test()'
could not load cltorch, skipping
[string "cltorch.test()"]:1: attempt to index global 'cltorch' (a nil value)

Editing the th shell script mentioned above to remove the #!/bin/sh shebang at the top (so the only line is the one beginning with exec) results in:

$ Sirius:cltorch dustin$ th -l cltorch -e 'cltorch.test()'
...whole lotta stuff for tests...
all tests finished

Just thought I'd drop this in here in case it helps you debug other OS X issues.

hughperkins commented 8 years ago

Ah... awesome info! Thank you Dustin.

ceberly commented 8 years ago

This does indeed fix the opencl issues for me even on El Capitan (10.11.1)

However now I get some jit warnings:

THClTensorMathTransformReduce.cl build log:
<program source>:8:6: warning: no previous prototype for function 'binary_op'
Pair binary_op( Pair a, Pair b ) {
     ^
<program source>:81:45: warning: comparison of integers of different signs: 'size_t' (aka 'unsigned      long') and 'int'
      if (row < num_rows && get_local_id(0) < s) {
                        ~~~~~~~~~~~~~~~ ^ ~

THClReduceAll.cl build log:
<program source>:9:10: warning: unused variable 'in1'
  float *in1 = &_in1;
     ^
<program source>:10:10: warning: unused variable 'out'
  float *out = &_out;

It's working though. I wish this issue would show up higher on the Google machine :)

hughperkins commented 8 years ago

@ceberly Cool :-) For the jit warnings, just added a commit which might remove the warning about no previous prototype for TransformReduce.cl line 8, and the warning about comparison of integers of different signs, at line 81 of same file.

The warnings about in1 and out are trickier to remove, and I dont have a system that shows these warnings, so leaving those for now. Will probably address sooner or later :-)

ceberly commented 8 years ago

@hughperkins cool. If anyone else is reading this, I am running through this: https://github.com/karpathy/char-rnn

The command I am using is: th train.lua -data_dir data/tinyshakespeare/ -opencl 1 -gpuid 0

Interestingly -gpuid 1 selects the GeForce GPU on my MacBook but the Intel GPU (0) seems to be faster

Using Apple , OpenCL platform: Apple
Using OpenCL device: HD Graphics 4000
hughperkins commented 8 years ago

Interestingly -gpuid 1 selects the GeForce GPU on my MacBook but the Intel GPU (0) seems to be faster

@ceberly Hmmm, that is quite surprising :-)

ceberly commented 8 years ago

@hughperkins i am new to torch and NN in general so it's almost certainly something I'm doing parameter-wise that's wrong. Thanks for this post, again, it saved me a ton of headache.

linkerlin commented 8 years ago

I have the same problem on a EI Capton MacBookPro. May we can fix it on master branch?

hughperkins commented 8 years ago

So, as far as I know, the issue is somehow in https://github.com/torch/trepl . I've logged it at https://github.com/torch/trepl/issues/32 I'm not sure how to go further with this. I think it needs someone familiar with scripting, and who has a Mac box, to take a peek.

spdustin commented 8 years ago

I commented there about the issue. Luarocks accepts a config var called wrap_bin_scripts that should be false for darwin/osx because it doesn't seem to add in the environment vars it's supposed to. The torch install instructions have you adding a line to .bash_profile anyway that, on my Mac in any case, specifies all the necessary environment settings.

Looks something like this: . $HOME/torch/install/bin/torch-activate

So as far as I can see, that sets whatever environment needs to be available for the torch REPL to work, no need to add the wrapper for the th binary.

markostam commented 8 years ago

this solved it for me as well on el capitan thank you

Sarastro72 commented 8 years ago

Removing shebang did not solve it for me but adding torch-activate to the th script did work.

#!/bin/bash

source torch-activate

exec '...'
hughperkins commented 8 years ago

This might be fixed now, since RPATH is set now?