hughperkins / DeepCL

OpenCL library to train deep convolutional neural networks
Mozilla Public License 2.0
865 stars 199 forks source link

Run time error report #59

Closed ghost closed 8 years ago

ghost commented 8 years ago

Hi Hugh,

I have successfully compiled DeepCL on my PC (Windows 7 64Bit, Visual Studio 2010 x86). But when I run the mnist example, it reports an error:

ForwardAuto: kernel 1 6ms clblas teardown Something went wrong: Label 28 exceeds number of softmax planes 10

The script I used to run this demo is:

deepcl_train.exe datadir=. trainfile=train-images.idx3-ubyte validatefile=t10k-images.idx3-ubyte

When I traced the error, I found it is in the learning loop while(!netLearner->isLearningDone()) {...}.

Could you give me some clue about how to fix this problem? Thanks.

hughperkins commented 8 years ago

Ok. Can you confirm the md5 checksum of the datafiles please? On my machine I have the following checksums:

$ md5sum /mnist/*
67e63534a144e1be3db6c9bfca1a9ece  /mnist/model.txt
85a9c3dba219af50242a99220c272481  /mnist/output.dat
5422ae00443ee2294976a2c283dba551  /mnist/out.txt
9d835bc9f2ddd0fc6dfa6cead8cb4e00  /mnist/t10k-cat.mat
1c91e98bb03fcf9bb35e98a74dd1027c  /mnist/t10k-dat.mat
2646ac647ad5339dbf082846283269ea  /mnist/t10k-images-idx3-ubyte
27ae3e4e09519cfbb04c329615203637  /mnist/t10k-labels-idx1-ubyte
c4f731158903901b4b5999ccab5f4bd8  /mnist/train-cat.mat
c9b44655a5d1a77a49aae0f4be446d30  /mnist/train.dat
8740d0f1d641fac583a09d38f7fa7a6d  /mnist/train-dat.mat
6bbc9ace898e44ae57da46a324031adb  /mnist/train-images-idx3-ubyte
6bbc9ace898e44ae57da46a324031adb  /mnist/train-images.idx3-ubyte
a25bea736e30d166cdddb491f175f624  /mnist/train-labels-idx1-ubyte
a25bea736e30d166cdddb491f175f624  /mnist/train-labels.idx1-ubyte
ghost commented 8 years ago

I do not know the other files' use. The md5 checksums of the mnist data I download from [1] are:

2646ac647ad5339dbf082846283269ea ./mnist/t10k-images.idx3-ubyte 27ae3e4e09519cfbb04c329615203637 ./mnist/t10k-labels.idx1-ubyte 6bbc9ace898e44ae57da46a324031adb ./mnist/train-images.idx3-ubyte a25bea736e30d166cdddb491f175f624 ./mnist/train-labels.idx1-ubyte

[1] http://yann.lecun.com/exdb/mnist/

hughperkins commented 8 years ago

Hmmm... well those md5 checksums look correct. Guess I need to fire up a win box... In case you're wondering, it does definitely run on my own ubuntu 64-bit box:

deepcl_train datadir=/mnist trainfile=train-images-idx3-ubyte validatefile=t10k-images-idx3-ubyt
...
after epoch 1 13134 ms
 training loss: 19359.6
 train accuracy: 53898/60000 89.83%
test accuracy: 9697/10000 96.97%
after tests 713 ms
record epoch=1
wrote weights to file, filesize 173KB

after epoch 2 12740 ms
 training loss: 5911.4
 train accuracy: 58161/60000 96.935%
test accuracy: 9814/10000 98.14%
after tests 713 ms
record epoch=2
wrote weights to file, filesize 173KB
hughperkins commented 8 years ago

Hmmm. Well. I confirm I get the same error message on Windows.

forward try kernel 1
   ... seems valid
ForwardAuto: kernel 1 0ms
clblas teardown
Something went wrong: Label 28 exceeds number of softmax planes 10

Can you paste a dummy message into this thread, so that it stays in my 'notifications' please? (otherwise it falls out of my 'horizon')

ghost commented 8 years ago

Thank you very much!

hughperkins commented 8 years ago

Oh, I see what's happening. Basically, the filename of hte labels file is determined by doing:

    string labelsFilePath = replace(imagesFilePath, "-images-idx3-ubyte", "-labels-idx1-ubyte");

You can call this hacky ;-) And it is. Anyway, it fails, if the filenames are using . instead of -, in between images and idx3. I will simply add a new line:

    string labelsFilePath = replace(imagesFilePath, "-images-idx3-ubyte", "-labels-idx1-ubyte");
    labelsFilePath = replace(labelsFilePath, "-images.idx3-ubyte", "-labels-idx1-ubyte");

That should probalby fix it

hughperkins commented 8 years ago

Addressed in b32766f . Can you pull down the latest version, and retry?

hughperkins commented 8 years ago

(Hmmm, probably would be better if MnistLoader replaced "images" with "labels" and "idx3" with "idx1". Anyway, I think the current version should work for you now?)

ghost commented 8 years ago

Yes, it works. Thank you very much! :+1: :-)

hughperkins commented 8 years ago

Cool :-)