hughperkins / DeepCL

OpenCL library to train deep convolutional neural networks
Mozilla Public License 2.0
865 stars 199 forks source link

DeepCL not running on Intel Beignet #80

Closed marty1885 closed 7 years ago

marty1885 commented 8 years ago

DeepCL's xor.py example don't run on my system (Intel I7 6700, OpenCL 1.2 beignet 1.1.1). But unittests shows OK.

here is the error log https://gist.github.com/marty1885/b0a21304e605502faa6c7bd224788a91

clinfo https://gist.github.com/marty1885/da58404b28010f61c87ecab910372ceb

Unittest result https://gist.github.com/marty1885/b9c3f96033aafcd77c30602e3c71128b

hughperkins commented 8 years ago

Hmmm.... that's strange... it runs ok on an HD5500, and hd5500 is a much older gpu, but with similar geometry and so on, and I'm using beignet too. You might be using an older version of beignet though? I built from source, in april:

ubuntu@peach:~/git/beignet$ git log -n 3 --oneline
8dfec54 only release cmrt device when it is already created
0943447 write mask in disassembly not parse correctly.
3547062 assert equation issue.
ubuntu@peach:~/git/beignet$ git log -n 1
commit 8dfec54e2f3e32710702ed60f5171741360f28bb
Author: Guo Yejun <yejun.guo@intel.com>
Date:   Thu Apr 28 07:48:23 2016 +0800

    only release cmrt device when it is already created

    this patch fixed the issue at https://bugs.freedesktop.org/show_bug.cgi?id=95136

    Signed-off-by: Guo Yejun <yejun.guo@intel.com>
    Reviewed-by: Yang Rong <rong.r.yang@intel.com>

I doubt thats the reason, but there seems no obvious other reasons, I'd be tempted to install that versoin as a first step. v1.1.1 is from last year:

ubuntu@peach:~/git/beignet$ git checkout Release_v1.1.1
M   examples/thirdparty/libva
Note: checking out 'Release_v1.1.1'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at 6be1a61... Bump version to 1.1.1
ubuntu@peach:~/git/beignet$ git log -n 1
commit 6be1a61e21647238f640f89c9b4b99443602b3e0
Author: Yang Rong <rong.r.yang@intel.com>
Date:   Thu Oct 8 11:05:36 2015 +0800

    Bump version to 1.1.1

    Signed-off-by: Yang Rong <rong.r.yang@intel.com>
hughperkins commented 8 years ago

(its puzzling that:

(Hmmm... do you have python 3? Can you try with python 3? Again, I dont see any reason why that would change anything, but ... there's no obvious reason I can see, so I'm looking for any differences between your environment, and my own)

(edit: actually, python 2 works for me too)

marty1885 commented 8 years ago

Hmm.... Yes, indeed I'm using a older version of beignet(I got it from apt-get. LOL). I'll try to use a new one.

I tried python3, but it couldn't find PyDeepCL. running pip3 install --pre DeepCL gives me this

Requirement already satisfied (use --upgrade to upgrade): DeepCL in /usr/local/lib/python2.7/dist-packages

And force upgrading it gives me some errors

Collecting DeepCL
  Downloading DeepCL-8.5.2.tar.gz (140kB)
    100% |████████████████████████████████| 143kB 347kB/s 
Installing collected packages: DeepCL
  Found existing installation: DeepCL 8.3.1
    Uninstalling DeepCL-8.3.1:
      Successfully uninstalled DeepCL-8.3.1
  Running setup.py install for DeepCL ... error
    Complete output from command /usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-djNaaL/DeepCL/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-VgGVej-record/install-record.txt --single-version-externally-managed --compile:
    version:  8.5.2
    running install
    running build
    running build_ext
    building 'PyDeepCL' extension
    creating build
    creating build/temp.linux-x86_64-2.7
    x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -I/usr/include/python2.7 -c PyDeepCL.cxx -o build/temp.linux-x86_64-2.7/PyDeepCL.o -std=c++0x -g -Wno-unused-function -Wno-unneeded-internal-declaration -Wno-strict-prototypes -DUSE_CLEW
    x86_64-linux-gnu-gcc: error: PyDeepCL.cxx: No such file or directory
    x86_64-linux-gnu-gcc: fatal error: no input files
    compilation terminated.
    error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

    ----------------------------------------
  Rolling back uninstall of DeepCL
Command "/usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-djNaaL/DeepCL/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-VgGVej-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-djNaaL/DeepCL/

which is weird beside I have successfully installed it before, but it failed this time and says it's installing to Python2 despite I'm running pip3.

BTW, which distro and kernel are you using. I'm running Ubuntu 16.04 with kernel 4.4.0-31-generic

hughperkins commented 8 years ago

Ah, interesting. I think thats a problem with the manifest. I'll check that.

hughperkins commented 8 years ago

It wasnt the manifest, I think it was the setup.py, but addressed in 27e1003 , and built as v8.5.3 https://pypi.python.org/pypi/DeepCL/8.5.3 Do you mind trying again, and seeing if it works better now? Installation works ok for me:

(p3b) ubuntu@peach:/tmp$ virtualenv -p python3 p3c
Running virtualenv with interpreter /tmp/p3b/bin/python3
Using real prefix '/usr/local'
New python executable in /tmp/p3c/bin/python3
Also creating executable in /tmp/p3c/bin/python
Installing setuptools, pkg_resources, pip, wheel...done.
(p3b) ubuntu@peach:/tmp$ source p3c/bin/activate
(p3c) ubuntu@peach:/tmp$ pip install DeepCL
Collecting DeepCL
  Using cached DeepCL-8.5.3.tar.gz
Building wheels for collected packages: DeepCL
  Running setup.py bdist_wheel for DeepCL ... done
  Stored in directory: /home/ubuntu/.cache/pip/wheels/2c/4f/ba/303b7ddda8b99b89b4c0de26e696bf2c6106d817a581e48048
Successfully built DeepCL
Installing collected packages: DeepCL
Successfully installed DeepCL-8.5.3
hughperkins commented 8 years ago

Did you get a moment to retry this?

marty1885 commented 8 years ago

Sorry for the late reply. I found that I can't build the latest version of PyDeepCL either from pip or source(download from pypi)

PyDeepCL gives me this(when I compile PyDeepCL by hand)

version:  9.0.1
running install
running bdist_egg
running egg_info
writing DeepCL.egg-info/PKG-INFO
writing top-level names to DeepCL.egg-info/top_level.txt
writing dependency_links to DeepCL.egg-info/dependency_links.txt
reading manifest file 'DeepCL.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'DeepCL.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
building 'PyDeepCL' extension
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -I/usr/include/python2.7 -c PyDeepCL.cpp -o build/temp.linux-x86_64-2.7/PyDeepCL.o -std=c++0x -g -Wno-unused-function -Wno-unneeded-internal-declaration -Wno-strict-prototypes -DUSE_CLEW
cc1plus: warning: command line option ‘-Wno-strict-prototypes’ is valid for C/ObjC but not for C++
PyDeepCL.cpp:316:32: fatal error: CppRuntimeBoundary.h: No such file or directory
compilation terminated.
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

This is weird because I do remember to activate DeepCL. I'm still trying to figuring out what's happening

hughperkins commented 8 years ago

Ah, that's normal actually. Can you do first:

source ~/DeepCL/dist/bin/activate.sh
``` (replacing `~/DeepCL` with wherever you cloned/installed DeepCL native libraries to) ?
hughperkins commented 8 years ago

Oh, just read this bit 'This is weird because I do remember to activate DeepCL.'. Hmmm....

hughperkins commented 8 years ago

Can you provide the full sequence of what you are running and the outputs? ie, build of native libraires, activation, installation of python library?

marty1885 commented 8 years ago

Well, I use the rep-built DeepCL 9.0.1 so I could go on to deal with the python part faster.

here is the log.

marty@linuxpc:~/Documents/Machine Learning$ ls
clBLAS  CLBlast  DeepCL-9.0.1  deepcl.rb  dist
marty@linuxpc:~/Documents/Machine Learning$ cd DeepCL-9.0.1
marty@linuxpc:~/Documents/Machine Learning/DeepCL-9.0.1$ # Here is the PyDeepCL dir
marty@linuxpc:~/Documents/Machine Learning/DeepCL-9.0.1$ source ../dist/bin/activate.sh 
marty@linuxpc:~/Documents/Machine Learning/DeepCL-9.0.1$ sudo python setup.py install
[sudo] password for marty: 
version:  9.0.1
running install
running bdist_egg
running egg_info
writing DeepCL.egg-info/PKG-INFO
writing top-level names to DeepCL.egg-info/top_level.txt
writing dependency_links to DeepCL.egg-info/dependency_links.txt
reading manifest file 'DeepCL.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'DeepCL.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
building 'PyDeepCL' extension
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -I/usr/include/python2.7 -c PyDeepCL.cpp -o build/temp.linux-x86_64-2.7/PyDeepCL.o -std=c++0x -g -Wno-unused-function -Wno-unneeded-internal-declaration -Wno-strict-prototypes -DUSE_CLEW
cc1plus: warning: command line option ‘-Wno-strict-prototypes’ is valid for C/ObjC but not for C++
PyDeepCL.cpp:316:32: fatal error: CppRuntimeBoundary.h: No such file or directory
compilation terminated.
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
marty@linuxpc:~/Documents/Machine Learning/DeepCL-9.0.1$ 

Edit: Here is the CPATH variable

marty@linuxpc:~/Documents/Machine Learning$ source dist/bin/activate.sh 
marty@linuxpc:~/Documents/Machine Learning$ echo $CPATH
/home/marty/Documents/Machine Learning/dist/include:/home/marty/Documents/Machine Learning/dist/include/easycl:/home/marty/Documents/Machine Learning/dist/include/deepcl:
hughperkins commented 8 years ago

Oh.... I think sudo erases your environment. I've never tested using sudo to install PyDeepCL. I guess it could work, but you'd need to first install DeepCL globally, and again I havent really tested that. The approach I've tested is something like:

sudo apt-get install -y python-virtualenv
virtualenv -p python2 env27
source env27/bin/activate
pip install numpy
source ../dist/bin/activate.sh    # since running source env27/bin/activate wipes the environment
python setup.py install

I think this will be easiest, since this is what I've tested.

Otherwise, if you wanted to try doing a global installation, using sudo, I think you'd need to try something like:

sudo cp ../dist/bin/* /usr/local/bin
sudo rsync -av ../dist/include/ /usr/local/include/
sudo rsync -av ../dist/lib/ /usr/local/lib/

As I say, I've tested this zero times though. I'm not really a fan of global installations, on the whole, since they're hard to uninstall/cleanup. Something along these lines should work though.

Or... like this might work:

sudo bash
source ../dist/bin/activate.sh
python setup.py install
marty1885 commented 8 years ago

in fact, no it don't.

marty@linuxpc:~/Documents/Machine Learning$ sudo echo $CPATH
/home/marty/Documents/Machine Learning/dist/include:/home/marty/Documents/Machine Learning/dist/include/easycl:/home/marty/Documents/Machine Learning/dist/include/deepcl:

But running as root or using virtualenv does make gcc to find the header. But it gives me this.....

c++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -L/home/marty/文件/Machine Learning/dist/lib -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security build/temp.linux-x86_64-2.7/PyDeepCL.o build/temp.linux-x86_64-2.7/CyWrappers.o -L../dist/lib -L../dist/lib/import -Wl,-R. -lclBLAS -lEasyCL -lDeepCL -o build/lib.linux-x86_64-2.7/PyDeepCL.so
c++: error: Learning/dist/lib: No such file or directory

Which looks like another issue with white spaces... (I'll fix this later on while I have some spare time)

After renaming my folder. I installed PyDeepCL on virtualenv. It works. deepcl_unittests still pass, BUT xor.py still fails with the same error Here is the new error log from xor.py (It's still the same error) https://gist.github.com/marty1885/ff55777c805bb7000c35f5da3eff21e0

hughperkins commented 8 years ago

After renaming my folder. I installed PyDeepCL on virtualenv.

Ok

Here is the new error log from xor.py (It's still the same error) https://gist.github.com/marty1885/ff55777c805bb7000c35f5da3eff21e0

Hmmm, thats weird. I'm going to have to think a bit, and then come up with some diag stuff to try. the unit-tests all run just fine, right? Thats odd then, since they should be running anything that comes up in xor.. what about some of the other python stuff? what happens f you run any of? :

?

marty1885 commented 8 years ago

Sure. But I'll have to run these test when I'm sleeping. DeepCL(in fact almost all OpenCL applications) freezes my desktop while they are running. eg. SmallPT GPU, LuxRender, etc...

I'll give you the results tomorrow in my local time(UTC +8)

marty1885 commented 8 years ago

I guess a properly running DeepCL program doesn't freeze my desktop! Great!

are all running without errors.

hughperkins commented 8 years ago

Hmmm. Interesting. So, just to reconfirm:

marty1885 commented 8 years ago

Yes, here is a clearer version

I just tried the benchmarks. They run OK, but exitst wit error. AttributeError: 'array.array' object has no attribute 'reshape'.(It might just be I don't know how to use the benchmarks)

Also something worth notice that I just tested some OpenCL code using my own version of EasyCL(https://github.com/marty1885/EasyCL) When I ran my reallySmallRSACrack example. It crashes with some bizarre error that I never get on my old system where I developed my version of EasyCL. (Intel I5 + Radeon HD 7850 with AMD APP SDK)
Either of the following errors occur when I run the reallySmallRSACrack example.

marty@linuxpc:~/Documents/EasyCL$ bin/reallySmallRSACrack 
selected device:
Name:       Intel(R) HD Graphics Skylake Desktop GT2
OpenCL version: OpenCL 1.2 beignet 1.1.1
One module without kernel function!
Error creating kernel reallySmallRSACrack, code -45
Segmentation fault (core dumped)

or this

marty@linuxpc:~/Documents/EasyCL$ bin/reallySmallRSACrack 
selected device:
Name:       Intel(R) HD Graphics Skylake Desktop GT2
OpenCL version: OpenCL 1.2 beignet 1.1.1
stringInput.cl:1:2: error: source file is not valid UTF-8
stringInput.cl:1:3: error: source file is not valid UTF-8
stringInput.cl:1:1: error: unknown type name 'x'
stringInput.cl:1:4: error: expected identifier or '('
stringInput.cl:1:5: warning: missing terminating '"' character

Error occured while compiling OpenCL program
Error Code: -11

It's two different error. But still weird. Could it be a huge bug in Beignet?

hughperkins commented 8 years ago

Could be. How about, try the very latest version of beignet master branch?