hughperkins / cltorch

An OpenCL backend for torch.
Other
291 stars 26 forks source link

zsh issues #50

Closed coodoo closed 8 years ago

coodoo commented 8 years ago

Seems a bunch of people had ran into zsh not working properly with cltorch, see #31 and #24, and for now the only solution is switching back to bash which is less than ideal, just wondering has anyone figured out a solution to this? Thanks!

hughperkins commented 8 years ago

Ok, I might take a look sometime.... seems like a reasonable request, and I think I can install zsh on ubuntu, afaik.

hughperkins commented 8 years ago

Hi. Ok, I installed zsh, and tried running cltorch, and so far no issues. See below. Can you produce a tiny test-case that demonstrates the issue please?

ubuntu@orange ~ % torch         
zsh: command not found: torch
ubuntu@orange ~ % source ~/torch/install/bin/torch-activate 
ubuntu@orange ~ % luajit
LuaJIT 2.1.0-beta1 -- Copyright (C) 2005-2015 Mike Pall. http://luajit.org/

 _____              _     
|_   _|            | |    
  | | ___  _ __ ___| |__  
  | |/ _ \| '__/ __| '_ \ 
  | | (_) | | | (__| | | |
  \_/\___/|_|  \___|_| |_|

JIT: ON SSE2 SSE3 SSE4.1 fold cse dce fwd dse narrow loop abc sink fuse
th> require 'torch'
th> a = torch.Tensor(2,3):uniform()
th> require 'cutorch'
th> a = torch.CudaTensor(2,3):uniform()
th> a
th>> 
th>> print(a)
stdin:3: '=' expected near 'print'
th> print(a)
 0.2988  0.0395  0.7658
 0.8750  0.2667  0.3257
[torch.CudaTensor of size 2x3]

th> require 'cltorch'
th> a = torch.ClTensor(3,2,5):uniform()
Using NVIDIA Corporation , OpenCL platform: NVIDIA CUDA
Using OpenCL device: GeForce 940M
th> a
th>> print(a)
stdin:2: '=' expected near 'print'
th> print(a)
(1,.,.) = 
  0.8787  0.8070  0.0460  0.5314  0.4063
  0.6157  0.8125  0.1887  0.3200  0.8569

(2,.,.) = 
  0.2003  0.0125  0.4530  0.7228  0.4754
  0.1571  0.4689  0.2694  0.7298  0.5362

(3,.,.) = 
  0.5828  0.9367  0.6495  0.0552  0.9694
  0.6742  0.1739  0.5109  0.2835  0.8508
[torch.ClTensor of size 3x2x5]
coodoo commented 8 years ago

Yes I do, a minimum reproducible case is like this (on OS X 10.11 El Capitan)

with zsh

th> require 'cltorch'
/Users/jlu/torch/install/share/lua/5.1/trepl/init.lua:384: /Users/jlu/torch/install/share/lua/5.1/cltorch/init.lua:19: cannot load '/Users/jlu/torch/install/lib/lua/5.1/libcltorch.so'
stack traceback:
    [C]: in function 'error'
    /Users/jlu/torch/install/share/lua/5.1/trepl/init.lua:384: in function 'require'
    [string "_RESULT={require 'cltorch'}"]:1: in main chunk
    [C]: in function 'xpcall'
    /Users/jlu/torch/install/share/lua/5.1/trepl/init.lua:651: in function 'repl'
    .../jlu/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:199: in main chunk
    [C]: at 0x0100bc3be0

with bash

th> require 'cltorch'
{
  finish : function: 0x04d17080
  about : function: 0x04d16a00
  getDeviceCount : function: 0x04d17198

  [redacted]
}

So precisely speaking, this is a zsh on os x issue, for some reason libcltorch.so is not found, no matter how I declare the ENV variables...

hughperkins commented 8 years ago

Ah, I dont have Mac OS X ;-) Is this something you might be able to help fix?

coodoo commented 8 years ago

Surely I would love to if I know how, hint?

hughperkins commented 8 years ago

Obviously if I knew how, I would have fixed it ;-) But I would imagine it's something to do with env vars. But as to what and how it's hard to say.

Things I would try:

If that throws up nothing, you're going to have to start loading stuff, and hack around a bit. If it was me, I'd start writing little c programs probalby, to load stuff from c, and find out what works, and what doesnt. Here's an example I was using to try to fix Mac problems in cltorch yesterday:

#include <iostream>
using namespace std;

#include <dlfcn.h>

extern "C" {
  #include "luaT.h"
  #include "lualib.h"
int luaopen_libpaths(lua_State *L);
}

int main(int argc, char *argv[]) {
  void *err = dlopen("libPyTorchLua.so", RTLD_NOW | RTLD_GLOBAL);
  cout << "err " << (long)err << endl;
  cout << "dlerror " << dlerror() << endl << endl;
  err = dlopen("/home/ubuntu/torch/install/lib/lua/5.1/libpaths.so", RTLD_NOW | RTLD_GLOBAL);
  cout << "err " << (long)err << endl;

    lua_State *L = luaL_newstate();
    luaL_openlibs(L);

luaopen_libpaths(L);

    lua_getglobal(L, "require");
    lua_pushstring(L, "torch");
    lua_call(L, 1, 0);

  return 0;
}

Obviously you'll need to modify the names of the lilbraries being loaded and stuff. And you might comment out hte first bit, that loads libPyTorchLua.so and libpaths, or not. Or modify it. Etc. To build it, i would think it's something like:

gcc -o mytest mytest.cpp 

... and that might be all you need. Oh.. this version actually hard-links with paths, so comment out the luaT and lualib headers, everything in that export section, and remove the call to luaopen_libpaths, and it should compile without many other libraries. You might need to add dl and m libraries, like:

gcc -o mytest mytest.cpp -ldl -lm

(dl is the library for dynamic loading, and m is the maths library, which isnt actually used here, so you might not need it)

Edit: Oh, you'll need those lua headers... which means ... ummm.... you probably need to link with lua library somehow. That bit is always a bit tricky... and so... at this point... you'd have to start thinking about what is failing and where really.... ummm....

hughperkins commented 8 years ago

How about start by trying to load libcltorch.so from the c program, and see what happens.

If that works (or if it doesnt), then try maybe loading the lua library, initializing lua (lua_openlibs), and then loading it. or requiring it.

I cant tell you an exact recipe.

Something I often do is, hack around in the lua library itself, build myself. For example, loaindg the library is done by loadlib.c, in lua source. You can sprinkle printfs liberally around that, build it, link with that. run in gdb. etc... Oh, I usually put a floating point exceptoin into loaderror, so that load errors trigger gdb to halt, instad of the program just exiting:

static void loaderror (lua_State *L, const char *filename) {
  int a = 0;
  int b = 5 / a;
  luaL_error(L, "error loading module " LUA_QS " from file " LUA_QS ":\n\t%s",
                lua_tostring(L, 1), filename, lua_tostring(L, -1));
}

It might need a certain amount of effort, and time :-P

hughperkins commented 8 years ago

So, I added a call to zsh to my travis script https://travis-ci.org/hughperkins/cltorch/builds/112530954#L1268 , which runs zsh against https://github.com/hughperkins/cltorch/blob/master/src/travis/install-torch.sh , but it seems like it doesnt quite really run zsh for some reason, since: 1. it doesnt fail 2. when I do ps, there is no zsh, wihch sort of hints it's not running zsh for some reason.

edit: sorry, I mean against https://github.com/hughperkins/cltorch/blob/master/src/test/test-zsh.zsh

coodoo commented 8 years ago

Just did a quick comparison in both zsh and bash, see results below.

Interestingly LD_LIBRARY_PATH and DYLD_LIBRARY_PATH never showed up, but bash still works alright.

zsh

PATH=/Users/jlu/torch/install/bin:/usr/local/bin:/usr/local/sbin:/usr/local/mysql/bin:/usr/local/share/npm/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Applications/VirtualBox.app/Contents/MacOS:/Users/jlu/temp/arc/arcanist/bin/
LUA_PATH=/Users/jlu/.luarocks/share/lua/5.1/?.lua;/Users/jlu/.luarocks/share/lua/5.1/?/init.lua;/Users/jlu/torch/install/share/lua/5.1/?.lua;/Users/jlu/torch/install/share/lua/5.1/?/init.lua;./?.lua;/Users/jlu/torch/install/share/luajit-2.1.0-beta1/?.lua;/usr/local/share/lua/5.1/?.lua;/usr/local/share/lua/5.1/?/init.lua
LUA_CPATH=/Users/jlu/torch/install/lib/?.dylib;/Users/jlu/.luarocks/lib/lua/5.1/?.so;/Users/jlu/torch/install/lib/lua/5.1/?.so;/Users/jlu/torch/install/lib/?.dylib;./?.so;/usr/local/lib/lua/5.1/?.so;/usr/local/lib/lua/5.1/loadall.so
_=/usr/bin/env

bash

PATH=/Users/jlu/torch/install/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin
LUA_PATH=/Users/jlu/.luarocks/share/lua/5.1/?.lua;/Users/jlu/.luarocks/share/lua/5.1/?/init.lua;/Users/jlu/torch/install/share/lua/5.1/?.lua;/Users/jlu/torch/install/share/lua/5.1/?/init.lua;./?.lua;/Users/jlu/torch/install/share/luajit-2.1.0-beta1/?.lua;/usr/local/share/lua/5.1/?.lua;/usr/local/share/lua/5.1/?/init.lua
LANG=en_US.UTF-8
LUA_CPATH=/Users/jlu/torch/install/lib/?.dylib;/Users/jlu/.luarocks/lib/lua/5.1/?.so;/Users/jlu/torch/install/lib/lua/5.1/?.so;/Users/jlu/torch/install/lib/?.dylib;./?.so;/usr/local/lib/lua/5.1/?.so;/usr/local/lib/lua/5.1/loadall.so
_=/usr/bin/env

More interestingly, I tried to manually export DYLD_LIBRARY_PATH via command line, and seems anything starting with DYLD_ will not be recognized, looking into that aspect now.

coodoo commented 8 years ago

I coped everything from /Users/jlu/torch/install/bin/torch-activate to my ~/.zshrc and verified all variables can be find by checking things like $ echo $DYLD_LIBARARY_PATH, but still no dice, libcltorch.so still can't be found.

coodoo commented 8 years ago

Seems it has something to do with DYLD paths not working on os x unless SIP is disabled, details here.

What I don't understand is why bash still works without those environment variables being set?

hughperkins commented 8 years ago

I think you should have an LD_LIBRARY_PATH. On my system:

$ echo $LD_LIBRARY_PATH
/home/ubuntu/torch/install/lib:
$ cat ~/torch/install/bin/torch-activate 
export LUA_PATH='/home/ubuntu/.luarocks/share/lua/5.1/?.lua;/home/ubuntu/.luarocks/share/lua/5.1/?/init.lua;/home/ubuntu/torch/install/share/lua/5.1/?.lua;/home/ubuntu/torch/install/share/lua/5.1/?/init.lua;./?.lua;/home/ubuntu/torch/install/share/luajit-2.1.0-beta1/?.lua;/usr/local/share/lua/5.1/?.lua;/usr/local/share/lua/5.1/?/init.lua'
export LUA_CPATH='/home/ubuntu/.luarocks/lib/lua/5.1/?.so;/home/ubuntu/torch/install/lib/lua/5.1/?.so;/home/ubuntu/torch/install/lib/?.so;./?.so;/usr/local/lib/lua/5.1/?.so;/usr/local/lib/lua/5.1/loadall.so'
export PATH=/home/ubuntu/torch/install/bin:$PATH
export LD_LIBRARY_PATH=/home/ubuntu/torch/install/lib:$LD_LIBRARY_PATH
export DYLD_LIBRARY_PATH=/home/ubuntu/torch/install/lib:$DYLD_LIBRARY_PATH
export LUA_CPATH='/home/ubuntu/torch/install/lib/?.so;'$LUA_CPATH
coodoo commented 8 years ago

Yep you are correct, both bash and zsh has that variable set, unfortunately zsh just won't work 😓

$ echo $LD_LIBRARY_PATH  
/Users/jlu/torch/install/lib:
hughperkins commented 8 years ago

For SIP, what is the recommended approach, instead of using LD_LIBRARY_PATH?

hughperkins commented 8 years ago

What happens if you copy everything from ~/torch/install/lib into ~/lib? (create ~/lib if it doesnt exist)

hughperkins commented 8 years ago

seems it is likely an RPATH issue. Andresy pointed this out a while actually, but I hadn't had a moment to find out more about it before https://github.com/hughperkins/cltorch/issues/15

Relevant references: http://linuxmafia.com/faq/Admin/ld-lib-path.html https://blogs.oracle.com/ali/entry/avoiding_ld_library_path_the

hughperkins commented 8 years ago

https://cmake.org/Wiki/CMake_RPATH_handling

hughperkins commented 8 years ago

Interestingly, if I clear my LD_LIBRARY_PATH, and move my build directoires, everything continues to run:

(envs)ubuntu:~/git$ env | grep PATH
GLADE_PIXMAP_PATH=:
XDG_SESSION_PATH=/org/freedesktop/DisplayManager/Session0
GLADE_MODULE_PATH=:
XDG_SEAT_PATH=/org/freedesktop/DisplayManager/Seat0
DEFAULTS_PATH=/usr/share/gconf/xfce.default.path
PATH=/home/ubuntu/torch/install/bin:/home/ubuntu/envs/bin:/home/ubuntu/bin:/home/ubuntu/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
LUA_PATH=/home/ubuntu/.luarocks/share/lua/5.1/?.lua;/home/ubuntu/.luarocks/share/lua/5.1/?/init.lua;/home/ubuntu/torch/install/share/lua/5.1/?.lua;/home/ubuntu/torch/install/share/lua/5.1/?/init.lua;./?.lua;/home/ubuntu/torch/install/share/luajit-2.1.0-beta1/?.lua;/usr/local/share/lua/5.1/?.lua;/usr/local/share/lua/5.1/?/init.lua
LUA_CPATH=/home/ubuntu/.luarocks/lib/lua/5.1/?.so;/home/ubuntu/torch/install/lib/lua/5.1/?.so;/home/ubuntu/torch/install/lib/?.so;./?.so;/usr/local/lib/lua/5.1/?.so;/usr/local/lib/lua/5.1/loadall.so
MANDATORY_PATH=/usr/share/gconf/xfce.mandatory.path
GLADE_CATALOG_PATH=:
(envs)ubuntu:~/git$ luajit -l cltorch -e 'print(torch.ClTensor(2,3):uniform())'
Using NVIDIA Corporation , OpenCL platform: NVIDIA CUDA
Using OpenCL device: GeForce 940M
 0.3308  0.3602  0.4100
 0.6792  0.7481  0.7736
[torch.ClTensor of size 2x3]
hughperkins commented 8 years ago

My RPATHs look like:

 objdump -x ~/torch/install/lib/lua/5.1/libcltorch.so | grep RPATH
  RPATH                $ORIGIN/../lib:/home/ubuntu/torch/install/lib

Per bottom of https://cmake.org/Wiki/CMake_RPATH_handling , yo uwould need to use otool to view these.

coodoo commented 8 years ago

Hahaha, copying everything from ~/torch/install/lib into ~/lib did the trick!

Ended up I just symlink it with $ ln -s ~/torch/install/lib/ ~/lib and it worked fine, not sure this is the best possible solution but at least it worked, thanks for helping out, you rock (as usual)!

hughperkins commented 8 years ago

Ok, that's interesting. Not sure that is the most sustainable solution, but good that it is working :-)

coodoo commented 8 years ago

Absolutely!