hughperkins / DeepCL

OpenCL library to train deep convolutional neural networks
Mozilla Public License 2.0
866 stars 200 forks source link

Weird crash #83

Closed NKUCodingCat closed 8 years ago

NKUCodingCat commented 8 years ago

Well, It's me again. Yesterday I ran a data set which is 33k1717 large using a script like test_deepcl_numpy.py and it works pretty good. However, when I tried a dataset, which is 14k5050 large, it tolds me that Out of resources, code -5, so I try the top 100 of my data but it crash without any other error informations, therefore I doubt that my memory of GPU is too small so I use an old graphic card with HD6450 and 2GB RAM but it crash without doubt.

Meanwhile, top 10 of data works well but it's not helpful, it seems that the limit of crash is about 20 or lower, I use GPU-z to monitor the memory and it only increase about 10MB(when using top 50 samples).

I am trying to tidy and update my code........hope it helpful for fixing problem

My PC:

AMD A10-5800K / 16GB RAM / WIndows 10 Pro (Too much memory for such a CPU, huh?) An extra R5 230 (aka HD6450) with 2GB GDDR3 ( Just a card for test ) DeepCL & PyDeepCL are latest version Driver provided by Windows update

NKUCodingCat commented 8 years ago

well, HD6450 is about 3 times lower than 7660D, seems reasonable, run the normal one (33k1717) use about 24MB so I think that memory may not the point. I gonna to make some log about my code.

NKUCodingCat commented 8 years ago

Oh my god 10 samples is also crash, Here are my codes , data, and logs https://1drv.ms/u/s!AiEbtKTwM8EbhQ4dbnSntnRVg-me

hughperkins commented 8 years ago

What is multiarray?

(env2) ubuntu@peach:/norep/Downloads/Reproduce$ pip install numpy
Requirement already satisfied (use --upgrade to upgrade): numpy in /data/norep/envs/env2/lib/python2.7/site-packages
(env2) ubuntu@peach:/norep/Downloads/Reproduce$ python -c 'import numpy'
(env2) ubuntu@peach:/norep/Downloads/Reproduce$ python -c 'import multiarray'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: No module named multiarray
hughperkins commented 8 years ago

Ah, this works:

>>> import numpy
>>> help(numpy)

>>> help(numpy.core)

>>> from numpy.core import multiarray
>>> 
hughperkins commented 8 years ago

Hmmm, still get it, even after adding a from numpy.core import multiarray :-P Thoughts?

imports done
Traceback (most recent call last):
  File "the_normal_code.py", line 20, in <module>
    img, lab = (zip(*pickle.load(open(sys.argv[1] + "Train_c.pkl"))))
ImportError: No module named multiarray
hughperkins commented 8 years ago

tried a bunch of stuff eg from http://stackoverflow.com/questions/3004792/cpickle-importerror-no-module-named-multiarray but issue persists. Possible to create some test code that doesnt use multiarray?

NKUCodingCat commented 8 years ago

Let me try to use list to storage file...... I don't know why multiarray does not work on your env. I am in Python 2.7, so.... are you in 3.4 or 3.5? I gonna to make a new data set

hughperkins commented 8 years ago

Ok, sounds good. for python version, usually I use python 3.4, but here I'm using python 2.7, since cPickle needs 2.7. I didnt try using 3.4 instead.

NKUCodingCat commented 8 years ago

Data ref by the_crash_code.py is saved by list, is it work for you?

I am uploading new data set......it is tooooooooooo slow

NKUCodingCat commented 8 years ago

I use list to save a new pickel file....hope it works for you https://1drv.ms/u/s!AiEbtKTwM8EbhQ81AHf6NzSnXzBP There are so many differences between windows and *nix, and there are another bunch of problems when using Non-ASCII, I use open(... , "wb") and built-in list, hope it will work

hughperkins commented 8 years ago

好一点 :D

hughperkins commented 8 years ago

Ok, so for the the_normal_code_2.py, this runs now, but I have to cast the images as int32:

img, lab = (zip(*pickle.load(open(sys.argv[1] + "Train_c2.pkl"))))
img = map(np.array, img)
N = len(lab)
images = np.array(map(lambda x: x.reshape(-1,), img)).reshape(-1,).astype(np.float32)
labels = np.array(lab).astype(np.int32)

img, lab = (zip(*pickle.load(open(sys.argv[1] + "Test_c2.pkl"))))
img = map(np.array, img)
N_t = len(lab)
images_t = np.array(map(lambda x: x.reshape(-1,), img)).reshape(-1,).astype(np.float32)
labels_t = np.array(lab).astype(np.int32)
NKUCodingCat commented 8 years ago

How about the crash one? Does it work?

hughperkins commented 8 years ago

For the the_crash_code, I had to cast the labels to in32 too:

labels = np.array(lab).astype(np.int32)

... and then it runs ok for me... oh wait ... segfault :-P

ForwardAuto: kernel 7 5ms
forward try kernel 7
   ... seems valid
ForwardAuto: kernel 7 5ms
   forward kernel 0: cannot be used
   forward kernel 1 time: 1ms
   forward kernel 2 time: 2ms
   forward kernel 3 time: 2ms
   forward kernel 4 time: 1ms
   forward kernel 5 time: 1ms
   forward kernel 6 time: 1ms
   forward kernel 7 time: 5ms
   forward layer selected kernel 1
   forward kernel 0: cannot be used
   forward kernel 1 time: 0ms
   forward kernel 2 time: 0ms
   forward kernel 3 time: 0ms
   forward kernel 4 time: 0ms
   forward kernel 5 time: 0ms
   forward kernel 6 time: 0ms
   forward kernel 7 time: 5ms
   forward layer selected kernel 1
Segmentation fault

Will dig a bit

NKUCodingCat commented 8 years ago

I tried np.array(lab).astype(np.int32) in the crash one, it got Out of resources, code -5 again……but this time it does not jump a "stop working"dialog, so....maybe it is better than original one?

It terminate without any other error info, I gonna to use ubuntu and manjaro to try again

hughperkins commented 8 years ago

So, unless I'm miscalculating, here's how I see the images array: shape:

images.shape (13446000,)

N_t:

N_t 14940

Then, you are specifying the images are 50x50 I think?

net = PyDeepCL.NeuralNet(cl, 1, 50)

So, if we divide 13446000 by 50, and by 50:

$ wcalc "13446000/50/50"
 = 5378.4

Seems there are fewer images in the pickle than N_t indicates?

NKUCodingCat commented 8 years ago

huh......seems there some mistake.....let me validate my data.....it is received from others

hughperkins commented 8 years ago

Is it because I suggested batching up the images, but didnt think very far about how to handle back-propagation if we do that?

hughperkins commented 8 years ago

Hmmm, actually backpropagation will be kind of ... tricky :-P if we batch up the images. Not saying it's impossible, but it would definitely be ... challenging. We'd need to somehow handle the whole softmax bit ourselves probably, and also the last few layers will look really weird, because we'd need to somehow have three independent paths for the fc layers...

hughperkins commented 8 years ago

On the whole, probably better and less buggy to actually update the code to handle rectangular images...

NKUCodingCat commented 8 years ago

Ok, I found the error, it is 30*30 but not 50 .... sorry for my mistake....

NKUCodingCat commented 8 years ago

Yes, you are right......pass a 1-D data is hard to debug and join it together make things mess up.....so I think that change the interface slightly to handle a 2D or 3D matrix is better. Use an 4D matrix to represent a set of colored images looks more friendly and less buggy

hughperkins commented 8 years ago

Hmmm, fair point :-)

hughperkins commented 8 years ago

Note that for now, you can keep it 3d/4d, and flatten it just at the point of entry, like:

netLearner = PyDeepCL.NetLearner(
    sgd, net,
    N, images.reshape(-1), labels,
    N_t, images_t.reshape(-1), labels_t,
    16)

It is true that I could do this internally though, but this way might work for now?

hughperkins commented 8 years ago

Also, I guess might be good to sanitize the input size, rather than just crashing :-)

NKUCodingCat commented 8 years ago

I have no idea about back-propagation with batch-up images to be square..... maybe it will need another algorithm to train it. So I think that pad each image to a larger square matrix is an good choice so far

hughperkins commented 8 years ago

Yes, padding sounds good for now :-)

NKUCodingCat commented 8 years ago

Wow, I had thought about simply reshape(-1) but I am not pretty sure the behavior of reshape function, but it will not effect the result. So, this may become the solution for me :-)

hughperkins commented 8 years ago

k :-)

hughperkins commented 8 years ago

(updated to accept 4d tensor in bf77b9d2c )

hughperkins commented 8 years ago

(built as 9.0.1 )

hughperkins commented 8 years ago

(well.... building)

NKUCodingCat commented 8 years ago

Why not add numpy as requirment to here, it seems more graceful (or Pythonic) https://github.com/hughperkins/DeepCL/blob/bf77b9d2c1beda031c803f17f37414ee6ed470aa/python/setup.py#L129

Numpy can be installed by pip both on Windows and *nix, Python 2 & 3, therefore it will only increase a bit complexity.

hughperkins commented 8 years ago

Good idea :-) 5b61476

NKUCodingCat commented 8 years ago

Well, a bad news is I cannot use pip to install DeepCL 9.0.1 , it told me that PyDeeCL.pyx does not match any files.

I don't know what had happened. Keep trying.

hughperkins commented 8 years ago

Hmmm. thats odd. Can you tr ythe following?

sudo apt-get install python-virtualenv
virtualenv -p python2 /tmp/p2_c
source /tmp/p2_c/bin/activate
pip install PyDeepCL

... and provide the full output?

For me I get:

(p4) ubuntu@peach:/tmp$ virtualenv -p python2 p2_c
Running virtualenv with interpreter /tmp/p4/bin/python2
Using real prefix '/usr'
New python executable in /tmp/p2_c/bin/python2
Also creating executable in /tmp/p2_c/bin/python
Installing setuptools, pkg_resources, pip, wheel...done.
(p4) ubuntu@peach:/tmp$ source p2_c/bin/activate
(p2_c) ubuntu@peach:/tmp$ pip install DeepCL
Collecting DeepCL
  Using cached DeepCL-9.0.1.tar.gz
Building wheels for collected packages: DeepCL
  Running setup.py bdist_wheel for DeepCL ... done
  Stored in directory: /home/ubuntu/.cache/pip/wheels/36/fa/06/c44daca5984c3bb5302909950984e43dacfeaa316b5a819371
Successfully built DeepCL
Installing collected packages: DeepCL
Successfully installed DeepCL-9.0.1
NKUCodingCat commented 8 years ago

Oh I am on windows ....... Ubuntu may works well but let me test it again

NKUCodingCat commented 8 years ago

Ok, Here are my test results,

Python 2.7 x86 ver. works pretty well to upgrade to 9.0.1 x64 provided by python.org works well, too

But the python27 x64 provided by intel does not work, let me uninstall PyDeepCL and redo it again

hughperkins commented 8 years ago

"provided by Intel"? https://software.intel.com/en-us/python-distribution ?

NKUCodingCat commented 8 years ago

Yes, it is. I got such output again

C:\Users\NKUCodingcat>C:\IntelPython27\python.exe -m pip install --pre deepcl -i http://pypi.v2ex.com/simple --trusted-host pypi.v2ex.com
Collecting deepcl
  Downloading http://pypi.v2ex.com/packages/ff/35/6981ad57ad8f74a21bd5c4a3b966313e8c9cb269298452ca668c3792a9c7/DeepCL-9.0.1.tar.gz (147kB)
    100% |████████████████████████████████| 153kB 696kB/s
    Complete output from command python setup.py egg_info:
    cythonizing...
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "c:\users\nkucod~1\appdata\local\temp\pip-build-uok4ak\deepcl\setup.py", line 98, in <module>
        ext_modules = cythonize(ext_modules)
      File "C:\IntelPython27\lib\site-packages\Cython\Build\Dependencies.py", line 758, in cythonize
        aliases=aliases)
      File "C:\IntelPython27\lib\site-packages\Cython\Build\Dependencies.py", line 651, in create_extension_list
        for file in nonempty(sorted(extended_iglob(filepattern)), "'%s' doesn't match any files" % filepattern):
      File "C:\IntelPython27\lib\site-packages\Cython\Build\Dependencies.py", line 103, in nonempty
        raise ValueError(error_msg)
    ValueError: 'PyDeepCL.pyx' doesn't match any files

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in c:\users\nkucod~1\appdata\local\temp\pip-build-uok4ak\deepcl\

It is hard for me to connect pypi so I use the mirror

NKUCodingCat commented 8 years ago

qq 20160729145744

Python 2.7.12 |Anaconda 4.1.1 (64-bit)| (default, Jun 29 2016, 11:07:13) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org

Oh his python is provided by anaconda, interesting

hughperkins commented 8 years ago

Ah, I can reproduce the problem. It's related to cython being present. I'll ponder a moment how to solve that...

hughperkins commented 8 years ago

Maybe fixed in b80a142 Do you want to retry? v9.0.2

hughperkins commented 8 years ago

oh wait... build failed. hmmm

hughperkins commented 8 years ago

Try now? v9.0.3

NKUCodingCat commented 8 years ago

Seems not easy to fix……

Sorry for nothing can help you

hughperkins commented 8 years ago

Mmmm?

hughperkins commented 8 years ago

All the versions have built ok:

all_versions_build

NKUCodingCat commented 8 years ago

Works on Intel Python27 now. Not sure anacoda but I guess it works too.

NKUCodingCat commented 8 years ago

anacoda passed. Thank you for your help