hughperkins / DeepCL

OpenCL library to train deep convolutional neural networks
Mozilla Public License 2.0
867 stars 200 forks source link

How to Save/Load a NeuralNet via python? #84

Open NKUCodingCat opened 8 years ago

NKUCodingCat commented 8 years ago

I read the interface declared in net/NeuralNet.h and NeuralNet.pyx, seems there is not a function provided to dump an entire NeuralNet with trained weight. However, mentioned weight.dat was not found when I use Python Interface to train network. data from getOutput() in NeuralNet.pyx seems very hard to understand. So is it possible to add such a function which can Save/Load entire CNN ( No matter how it works ) to let a trained network can be used in other place? In a word, how to save and reuse a trained net in Python?

Meanwhile, I simply tried cPickle to dump it but it failed.

NKUCodingCat commented 8 years ago

I tried to hack code to save a net, when I call net.getLayer(0).getWeights(), no matter I train the net or not, I got

Traceback (most recent call last):
  File "W:\Users\admin\Desktop\Reproduce\the_normal_code_2.py", line 66, in <module>
    print(net.getLayer(idx).getWeights())
  File "Layer.pyx", line 40, in PyDeepCL.Layer.getWeights (PyDeepCL.cpp:10189)
IndexError: Out of bounds on buffer access (axis 0)

Alright I found that not all layer have its weights .......

NKUCodingCat commented 8 years ago

When I try to call a RandomTranslation Layer's getOutputCubeSize(), Python crashed immediately, this function defined in Layer Class as base, and try.....except cannot catch this error @hughperkins

hughperkins commented 8 years ago

Where are you seeing RandomTranslationLayer?

DeepCL/python$ grep -i randomtranslate *
grep: benchmarking: Is a directory
grep: build: Is a directory
grep: cmake: Is a directory
grep: DeepCL.egg-info: Is a directory
grep: dist: Is a directory
grep: examples: Is a directory
grep: test: Is a directory
hughperkins commented 8 years ago

well

python$ grep -i randomtranslation *
grep: benchmarking: Is a directory
grep: build: Is a directory
grep: cmake: Is a directory
grep: DeepCL.egg-info: Is a directory
grep: dist: Is a directory
grep: examples: Is a directory
NeuralNet.pyx:                            # used for example by randomtranslations layer (for now,
NeuralNet.pyx:                            # used only by randomtranslations layer)
PyDeepCL.cpp: *                             # used for example by randomtranslations layer (for now,
PyDeepCL.cpp: *                             # used for example by randomtranslations layer (for now,
PyDeepCL.cpp: *                             # used only by randomtranslations layer)
PyDeepCL.cpp: *                             # used for example by randomtranslations layer (for now,
PyDeepCL.cpp: *                             # used only by randomtranslations layer)
PyDeepCL.cpp: *                             # used for example by randomtranslations layer (for now,
PyDeepCL.cpp: *                             # used only by randomtranslations layer)
grep: test: Is a directory
hughperkins commented 8 years ago

Oh, maybe that's the problem, the fact that it isnt defined perhaps?

hughperkins commented 8 years ago

Oh, I see the issue I think. But... how are you creating a random translations layer in order to test this?

hughperkins commented 8 years ago

issue about outputcubsize addressed in 5a53d11 Test output:

DeepCL/python$ py.test -sv test/test_basic.py -k outputcube
============================= test session starts ==============================
platform linux -- Python 3.5.1+, pytest-2.9.2, py-1.4.31, pluggy-0.3.1 -- /norep/envs/env3/bin/python3
cachedir: .cache
rootdir: /data/norep/git/DeepCL/python, inifile: 
collected 5 items 

test/test_basic.py::test_getoutputcubesize X server found.
Using NVIDIA Corporation , OpenCL platform: NVIDIA CUDA
Using OpenCL device: GeForce 940M
statefultimer v0.7
PASSED

===================== 4 tests deselected by '-koutputcube' =====================
==================== 1 passed, 4 deselected in 0.34 seconds ===================
hughperkins commented 8 years ago

getweights() issue addressed in 14716af Also, made it return a numpy tensor (unfortunately 1d for now, but at least its a numpy tensor)

Test output:

DeepCL/python$ py.test -sv test/test_basic.py -k getweights
============================= test session starts ==============================
platform linux -- Python 3.5.1+, pytest-2.9.2, py-1.4.31, pluggy-0.3.1 -- /norep/envs/env3/bin/python3
cachedir: .cache
rootdir: /data/norep/git/DeepCL/python, inifile: 
collected 6 items 

test/test_basic.py::test_getweights X server found.
Using NVIDIA Corporation , OpenCL platform: NVIDIA CUDA
Using OpenCL device: GeForce 940M
statefultimer v0.7
forward try kernel 0
  ... not plausibly optimal, skipping
forward try kernel 1
   ... seems valid
ForwardAuto: kernel 1 0ms
net.getLayer(1).getWeights() None
net.getLayer(2).getWeights().shape (112,)
PASSED

===================== 5 tests deselected by '-kgetweights' =====================
==================== 1 passed, 5 deselected in 0.65 seconds ===================
hughperkins commented 8 years ago

I think pickling/dumping the entire network will take a bit of work. Let's simply save the weights for now, just as you are attempting?

NKUCodingCat commented 8 years ago

The main problem I met is how.can I get the layer's info and reuse it, should I analyze the output of asstring()? Sounds .... uh....complicated

meanwhile, I had not found a wrapper of predict, am I missing something?

hughperkins commented 8 years ago

Bunch of changes in 8fac057

test_setweights test created, and passes:

DeepCL/python$ py.test -sv test/test_basic.py -k test_setweights
============================= test session starts ==============================
platform linux -- Python 3.5.1+, pytest-2.9.2, py-1.4.31, pluggy-0.3.1 -- /norep/envs/env3/bin/python3
cachedir: .cache
rootdir: /data/norep/git/DeepCL/python, inifile: 
collected 7 items 

test/test_basic.py::test_setweights X server found.
Using NVIDIA Corporation , OpenCL platform: NVIDIA CUDA
Using OpenCL device: GeForce 940M
i 2 weightsSize 112
PASSED

================== 6 tests deselected by '-ktest_setweights' ===================
==================== 1 passed, 6 deselected in 0.36 seconds ===================
hughperkins commented 8 years ago

The main problem I met is how.can I get the layer's info and reuse it, should I analyze the output of asstring()? Sounds .... uh....complicated

Hmmm, well, the following methods exist:

Which concrete specific information do you need?

hughperkins commented 8 years ago

meanwhile, I had not found a wrapper of predict, am I missing something?

You can do .forward(images), then call .getLabels() on the last layer:

    net.forward(imagesBatch)
    predictions = net.getLastLayer().getLabels()
    print('predictions', predictions)

There's an example of using this in python/test_lowlevel.py

NKUCodingCat commented 8 years ago

the crash code is something like the net created in test_deepcl_numpy.py, after created a net call net.getLayer(1).getOutputCubeSize(). Do you need my code to test?

Otherwise, what I want to find is a way that I can get all layer's info(necessary info to rebuild a same net) .

hughperkins commented 8 years ago

(for the net weights, there's a c++ class, src/weights/WeightsPersister.h . I can probably wrap that actually. That would allow you to read/write the weights of an entire network. You'd need to know the definiton of that network though)

hughperkins commented 8 years ago

the crash code is something like the net created in test_deepcl_numpy.py, after created a net call net.getLayer(1).getOutputCubeSize(). Do you need my code to test?

I think this is fixed already, in 5a53d11 I guess I should create a new binary release probably right?

NKUCodingCat commented 8 years ago

Alright I will try it later , thanks a lot

hughperkins commented 8 years ago

( v10.0.0 is building now, shoudl be ready in ~15 minutes )

hughperkins commented 8 years ago

(Build in progress:

building )

hughperkins commented 8 years ago

v10.0.0 built:

10-0-0-built

hughperkins commented 8 years ago

Hmmmm, so, I made it so that getOutputCubeSize() throws a normal python exception. But I guess that you probably want it to return the actual output cube size in fact? :-D

hughperkins commented 8 years ago

randomtranslationslayer.getOutputCubeSize fixed in 7e62cb2

DeepCL/python$ py.test -sv test/test_basic.py -k cube
============================= test session starts ==============================
platform linux -- Python 3.5.1+, pytest-2.9.2, py-1.4.31, pluggy-0.3.1 -- /norep/envs/env3/bin/python3
cachedir: .cache
rootdir: /data/norep/git/DeepCL/python, inifile: 
collected 7 items 

test/test_basic.py::test_getoutputcubesize X server found.
Using NVIDIA Corporation , OpenCL platform: NVIDIA CUDA
Using OpenCL device: GeForce 940M
statefultimer v0.7
net.getOutputCubeSize() 75264
PASSED

======================== 6 tests deselected by '-kcube' ========================
==================== 1 passed, 6 deselected in 0.38 seconds ==============
hughperkins commented 8 years ago

(v10.0.1 building)

NKUCodingCat commented 8 years ago

actually I don't know how does this net works, any suggestion about what should I know when I rebuild a net? I am setting a new mobile for my mom so I cannot test anything now.....

hughperkins commented 8 years ago

Well... I guess I'm imagining that you could define the network in a python script, like:

network.py

def create_network(cl):
    ... create network here

Then, when you train, you'll do something like:

import network

... create cl ...
net = network.create_network(cl)
... train network
... save weights to pickle or similar

At prediction:

import network

... create cl ...
net = network.create_network(cl)
... load weights from pickle
... run prediction against new images etc

How does this sound?

hughperkins commented 8 years ago

(v10.0.1 built https://pypi.python.org/pypi/DeepCL/10.0.1 )

hughperkins commented 8 years ago

(note that you'll need to reinstall the native library too, for 10.0.1. This is beause the fix for getOutputCubeSize is actualy in the native underlying library)

hughperkins commented 8 years ago

(the above methodology for save/load network is how it works in torch by the way, eg:

NKUCodingCat commented 8 years ago

Yeah, I know that......I plan to write a class to handle that, thank you for your tips about it

NKUCodingCat commented 8 years ago

Ok, uh..... I tried to make a class to create network and everything seems great. However, I think that maybe DeepCL can provide an inverse Operation of NetdefToNet.createNetFromNetdef, that is, to generate a netdef string according to a net, so we can just save the InputLayer , the Netdef String and the weights, these can be serialization easier.

But there is a question ..... can each property be represented by Netdef string? I am not sure about it.

NKUCodingCat commented 8 years ago

Maybe I can write a Parser to do that, like

ConvolutionalLayer{ LayerDimensions{ inputPlanes=8 inputSize=8 numFilters=16 filterSize=5 outputSize=8 padZeros=1 biased=1 skip=0} }

can be transferred to

PyDeepCL.ConvolutionalMaker().numFilters(16).filterSize(5).padZeros().biased()

but it's not easy to use and buggy, the Inverse operation way may easy to use.

hughperkins commented 8 years ago

4bdea8a eg see https://github.com/hughperkins/DeepCL/blob/4bdea8aaf93712a7f1bbb2e7b85bc14e1deef61e/python/test_deepcl.py#L36

Will build a binary release now

NKUCodingCat commented 8 years ago

Wow! That's pretty cool! So it's easy to make a protocol to build a net. it will be more portable (binary file is so tiny and I love it)

hughperkins commented 8 years ago

(built as v10.1.0 https://pypi.python.org/pypi/DeepCL/10.1.0 )

So it's easy to make a protocol to build a net. it will be more portable (binary file is so tiny and I love it)

Ok. Be aware that the netdef is only approximate. As long as you used a netdef to create the network in the first place, should be ok. If the network contains some more obscure layers or parameters, the netdef string wont show those.

NKUCodingCat commented 8 years ago

it's enough for me to simplify my work. thank you.

hughperkins commented 8 years ago

Cool :-)

NKUCodingCat commented 8 years ago

Ubuntu 16.04 LTS install PyDeepCL via pip, Output are as follows:

nkucodingcat@nkucodingcat-To-be-filled-by-O-E-M:~$ source '/home/nkucodingcat/deepcl_64/bin/easycl_activate.sh' 
nkucodingcat@nkucodingcat-To-be-filled-by-O-E-M:~$ sudo pip install --pre DeepCL[sudo] nkucodingcat 的密码: 
The directory '/home/nkucodingcat/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
The directory '/home/nkucodingcat/.cache/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Collecting DeepCL
  Downloading DeepCL-10.1.0.tar.gz (156kB)
    100% |████████████████████████████████| 163kB 485kB/s 
Requirement already satisfied (use --upgrade to upgrade): numpy in ./.local/lib/python2.7/site-packages (from DeepCL)
Installing collected packages: DeepCL
  Running setup.py install for DeepCL ... error
    Complete output from command /usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-mCka7b/DeepCL/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-aqsiCu-record/install-record.txt --single-version-externally-managed --compile:
    version:  10.1.0
    running install
    running build
    running build_ext
    building 'PyDeepCL' extension
    creating build
    creating build/temp.linux-x86_64-2.7
    x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -I/usr/include/python2.7 -c PyDeepCL.cpp -o build/temp.linux-x86_64-2.7/PyDeepCL.o -std=c++0x -g -Wno-unused-function -Wno-unneeded-internal-declaration -Wno-strict-prototypes -DUSE_CLEW
    cc1plus: warning: command line option ‘-Wno-strict-prototypes’ is valid for C/ObjC but not for C++
    PyDeepCL.cpp:316:32: fatal error: CppRuntimeBoundary.h: 没有那个文件或目录
    compilation terminated.
    error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

    ----------------------------------------
Command "/usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-mCka7b/DeepCL/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-aqsiCu-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-mCka7b/DeepCL/

nkucodingcat@nkucodingcat-To-be-filled-by-O-E-M:~$ x86_64-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=x86_64-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/5/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 5.3.1-14ubuntu2' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-5 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 5.3.1 20160413 (Ubuntu 5.3.1-14ubuntu2) 

I am upgrading each package via apt-get, gonna to re-test it later

"没有那个文件或目录" is "no such files or directory"

I think there are no more chinese charater in these output

hughperkins commented 8 years ago

Can you double-check that you:

hughperkins commented 8 years ago

Oh wait... can you remove the sudo in front of the pip install?

NKUCodingCat commented 8 years ago

re-downloaded, Double-checked, use sudo will raise same error, but without sudo, it told me that permission denied

NKUCodingCat commented 8 years ago

the last several lines of deepcl_unitest is

[       OK ] testGpuOp.addscalarinplace (52 ms)
[----------] 4 tests from testGpuOp (211 ms total)

[----------] 1 test from testjpeghelper
[ RUN      ] testjpeghelper.writeread
[       OK ] testjpeghelper.writeread (1 ms)
[----------] 1 test from testjpeghelper (1 ms total)

[----------] Global test environment tear-down
[==========] 158 tests from 29 test cases ran. (109559 ms total)
[  PASSED  ] 158 tests.

  YOU HAVE 2 DISABLED TESTS

seems nothing broken

hughperkins commented 8 years ago

So... basically, sudo will wipe your enviornmnt. That means that the environment variables set by the activate.sh line will be ignored. So, if you wanted to use sudo, you'd need to run the activate.sh line after sudo, ie:

sudo bash
source '/home/nkucodingcat/deepcl_64/bin/easycl_activate.sh'
pip install DeepCL

On the other hand, I dont really test or support using sudo to install. But, the permission denied error is probably because there are files in your virtualenv that are now owned by root. How are you running python? Are you using a virtualenv? Or are you using global python installation?

NKUCodingCat commented 8 years ago

Just running python in terminal directly..... let my try to sudo source activate.sh, since I don't like to use virtualenv

NKUCodingCat commented 8 years ago

alright, sudo bash......

NKUCodingCat commented 8 years ago

It works . I should remember this tips in my mind.......

It can be added into README for newbue, I think..... Gonna to sleep, see you

hughperkins commented 8 years ago

Cool :-)

NKUCodingCat commented 8 years ago

Uh....... I can not call padZeros(self, bint __padZeros) function defined here https://github.com/hughperkins/DeepCL/blob/master/python/LayerMaker.pyx#L60, is it padZeros(True) the correct way?

hughperkins commented 8 years ago

Hmmmm, you're right, there's a duplicate padZeros(...) method. Fixed in e8ddaf3

Actually, you can simply drop the True parameter, since it defaults to True anyway: it's enough to do .padZeros(), eg see https://github.com/hughperkins/DeepCL/blob/master/python/test_lowlevel.py#L26-L28

NKUCodingCat commented 8 years ago

Yeah, it is. By the way, what is imageSize means in FullyConnectedLayer? I found that it was set to 1 as default.

hughperkins commented 8 years ago

It doesnt mean anything, you can ignore it. Well... actually... so what it was is, I was using it to predict the next-move in Go, 围棋 , so that would be one of 19x19 output positions. I could represent that as 19x19 = 361 neurons, but I decided it felt more natural to rearrange those into a 19x19 grid.

So, a fully connected layer, it's fully-connected, but you can imagine the outputs as a tower, where the height of the tower is the number of neurons, and each plane in the tower is 1x1. Or you can rearrange that tower into a square grid, and make there be only one such square.

On the whole, unless you have a good reason, you'd probably better keep imageSize to 1, in fully connected layers :-)