hughperkins / DeepCL

OpenCL library to train deep convolutional neural networks
Mozilla Public License 2.0
867 stars 199 forks source link

How to Save/Load a NeuralNet via python? #84

Open NKUCodingCat opened 8 years ago

NKUCodingCat commented 8 years ago

I read the interface declared in net/NeuralNet.h and NeuralNet.pyx, seems there is not a function provided to dump an entire NeuralNet with trained weight. However, mentioned weight.dat was not found when I use Python Interface to train network. data from getOutput() in NeuralNet.pyx seems very hard to understand. So is it possible to add such a function which can Save/Load entire CNN ( No matter how it works ) to let a trained network can be used in other place? In a word, how to save and reuse a trained net in Python?

Meanwhile, I simply tried cPickle to dump it but it failed.

NKUCodingCat commented 8 years ago

I think I get the point~ :-P

NKUCodingCat commented 8 years ago

Ok I try to write a script to rebuild a net and it seems work, However, when I try to use this net to predict somthing, it crashed without any error output codes are here : https://1drv.ms/u/s!AiEbtKTwM8EbhTLN_waVUxz9q3PA

run make_pkl.py to build a seriallized net and run crash_code.py to reproduce the crash, DeepCL_sl.py is the module I write, running in python 2.7

Another question is .... what is your numpy version? maybe it is the reason that you can not use numpy.core.multiarray in pickle/cPickle hope it can reproduce.

NKUCodingCat commented 8 years ago

I think I had made a same network as the previous one (at least they look same in asString() ) and loaded their weights by using setWeights() function. Technically it can work out a same result as the previous one.

hughperkins commented 8 years ago

When I run make_pkl.py, I get:

Traceback (most recent call last):
  File "make_pkl.py", line 59, in <module>
    128)
  File "NetLearner.pyx", line 4, in PyDeepCL.NetLearner.__cinit__ (PyDeepCL.cpp:16812)
ValueError: Buffer dtype mismatch, expected 'int' but got 'long'
hughperkins commented 8 years ago

So, I changed line 23 and 30, to have , type=np.int32:

labels = np.array(lab, dtype=np.int32)
hughperkins commented 8 years ago

but right, after making that change, it crashes as you are say.

hughperkins commented 8 years ago

Ok, I managed to get this down to the minimum test case to reproduce:

test.py:

import PyDeepCL

def get_net():
    cl = PyDeepCL.DeepCL()
    net = PyDeepCL.NeuralNet(cl, 1, 1)
    net.addLayer(PyDeepCL.ActivationMaker().tanh())
    return net

call_test.py:

import test

net = test.get_net()
net.setBatchSize(1)

Result:

$ python call_test.py
Using NVIDIA Corporation , OpenCL platform: NVIDIA CUDA
Using OpenCL device: GeForce 940M
i 0
i 1
Segmentation fault
hughperkins commented 8 years ago

Ok, moving the cl creation into the caller fixes it. ie: test.py:

import PyDeepCL

def get_net(cl):
    net = PyDeepCL.NeuralNet(cl, 1, 1)
    net.addLayer(PyDeepCL.ActivationMaker().tanh())
    net.setBatchSize(1)
    return net

call_test.py:

import PyDeepCL
import test

cl = PyDeepCL.DeepCL()
net = test.get_net(cl)
net.setBatchSize(1)

result:

$ python call_test.py
Using NVIDIA Corporation , OpenCL platform: NVIDIA CUDA
Using OpenCL device: GeForce 940M
i 0
i 1
i 0
i 1
$  

(the i 0, i 1, is cos of some debug code I added to DeepCL, you can just ignore those lines)

NKUCodingCat commented 8 years ago

so it means I should pass a cl object into my function instead of create in it? is it caused by python's gc behaviour?

hughperkins commented 8 years ago

so it means I should pass a cl object into my function instead of create in it?

That will get your code working now, yes.

is it caused by python's gc behaviour?

Yes, I assume so. Whether I can, eg assign the cl object to the net object, eg in net's __init__ method is something I'm pondering.

hughperkins commented 8 years ago

Addressed in 450dba1 I'll create a new binary soonish (I'll probably wait till the next hour, to start the ec2 build instances).

NKUCodingCat commented 8 years ago

I try to return cl object with net object and it solved the crash too , therefore it must be a gc-caused problem. Easy to cause a bug, I think.

NKUCodingCat commented 8 years ago

Oh I have another request, can I disable the output of cl infos? something like

   forward kernel 4 time: 1ms
   forward kernel 5: cannot be used
   forward kernel 6 time: 3ms
   forward kernel 7 time: 706ms
   forward layer selected kernel 4
forward try kernel 5
ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
   ... not valid
forward try kernel 6
   ... seems valid
ForwardAuto: kernel 6 3ms
forward try kernel 5
cl/forward_fc_wgperrow.cl build log:
"W:\Users\admin\AppData\Local\Temp\OCL21BD.tmp.cl", line 75: warning: variable
          "loopsPerExample" was declared but never referenced
      const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
                ^

   ... seems valid
ForwardAuto: kernel 5 2ms
forward try kernel 5
cl/forward_fc_wgperrow.cl build log:
"W:\Users\admin\AppData\Local\Temp\OCL221B.tmp.cl", line 75: warning: variable
          "loopsPerExample" was declared but never referenced
      const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
                ^

   ... seems valid
ForwardAuto: kernel 5 3ms
hughperkins commented 8 years ago

You mean, disable all the spammy stuff, that you've shown in the codeblock?

NKUCodingCat commented 8 years ago

yeah, is it possible to easily disable them? Though it is harmless but it is not useful for using an network, either.

NKUCodingCat commented 8 years ago

just like the verbose option when booting the hackintosh, if it does not work ,we need it, but they were hidden in default :-P

hughperkins commented 8 years ago

Hmmm, seesm like configurable logging will need a bit more work than exposing one switch. Will ponder...

hughperkins commented 8 years ago

Seems like this would need some kind of configurable logging framework. I'm not really up on which logging frameworks work well in c++. thoughts?

NKUCodingCat commented 8 years ago

Uh...no idea about that...cpp is unfamiliar for me...

seems most of cpp program do not need a logging framework...

hughperkins commented 8 years ago

seems most of cpp program do not need a logging framework...

Not sure. Basically, in our case we want to be able to turn logging on/off at runtime. So, at the very least would need to create some methods like debug, info, warning etc, and replace all the couts with those. That's ... quite a lot of work :-P

NKUCodingCat commented 8 years ago

how about google glog just replace cout with LOG(INFO) seems easy to use(just looks easy and I had not tried it)

viper7882 commented 7 years ago

Hi @hughperkins,

I'm looking for a way to save and load the trained net in Python 2.7. Looking at the changes made by @NKUCodingCat seems logical. However I'm unable to find any of his save and load functionality available in the examples provided by DeepCL. By any chance if you have plan to provide public Python API for saving and loading the net in Python 2.7?

hughperkins commented 7 years ago

I do not intend to do that. But just through lack of time, rather than any fundamental objection. Is this something you might consider contributing to?

viper7882 commented 7 years ago

Noted with thanks. Let me see how I could contribute.