Closed blacky-i closed 8 years ago
Can you provide:
I guess it might be because you need to call to clewInit()
, https://github.com/hughperkins/EasyCL/blob/master/EasyCL.cpp#L51-L56 :
#ifdef USE_CLEW
bool clpresent = 0 == clewInit();
if(!clpresent) {
throw std::runtime_error("OpenCL library not found");
}
#endif
Thanks for reply! Here my piece of code
#include <QGuiApplication>
#include <QQmlApplicationEngine>
#include "network.h"
int main(int argc, char *argv[])
{
QCoreApplication::setAttribute(Qt::AA_EnableHighDpiScaling);
QGuiApplication app(argc, argv);
//this piece of code returning my OpenCL Platform. clBlasSetup() causes segmantation fault.
cl_uint numberPlatform;
auto ret = clGetPlatformIDs(0, NULL, &numberPlatform);
if (ret != CL_SUCCESS || numberPlatform == 0) {
return 0;
}
ClBlasInstance();
QQmlApplicationEngine engine;
engine.load(QUrl(QLatin1String("qrc:/main.qml")));
return app.exec();
}
auto ret = clGetPlatformIDs(0, NULL, &numberPlatform);
- this line where program stops with segmentation fault in toolslib.c
. In main function works fine. Only in clblasSetup()
.
Here is call stack
1 ??
2 getPlatforms toolslib.c 449 0x7ffff542202d
3 initStorageCache toolslib.c 482 0x7ffff54220cd
4 clblasSetup init.c 212 0x7ffff53cf61b
5 main main.cpp 18 0x403b18
I dont have any visibility into what your program is doing. Can you try to simplify your example so it doesnt bring in any of your own code please? Also, seems to be missing the include file for ClBlasInstance()
?
From the code you have posted, it looks like you havent created an OpenCL context. I dont quite remember how clblasSetup() interacts with opencl, but I reckon it probably needs a context.
Ok, i was investigating how DeepCL works and tried to launch your example.
#include <QObject>
#include <QDir>
#include <QStandardPaths>
#include <QDebug>
#include <QImage>
#include <QRgb>
#include<iostream>
#include "imagenetbatchinfo.h"
#include "DeepCL.h"
EasyCL *cl = new EasyCL();
NeuralNet *net = new NeuralNet(cl);
net->addLayer( InputLayerMaker::instance()->numPlanes(4)->imageSize(28) );
net->addLayer( NormalizationLayerMaker::instance()->translate( -2 )->scale( 1.0f / 2 ) );
net->addLayer( ConvolutionalMaker::instance()->numFilters(8)->filterSize(5)->biased() );
net->addLayer( ActivationMaker::instance()->relu() );
net->addLayer( PoolingMaker::instance()->poolingSize(2) );
net->addLayer( ConvolutionalMaker::instance()->numFilters(16)->filterSize(5)->biased() );
net->addLayer( ActivationMaker::instance()->relu() );
net->addLayer( PoolingMaker::instance()->poolingSize(3) );
net->addLayer( FullyConnectedMaker::instance()->numPlanes(150)->imageSize(1)->biased() );
net->addLayer( ActivationMaker::instance()->relu() );
net->addLayer( FullyConnectedMaker::instance()->numPlanes(10)->imageSize(1)->biased() );
net->addLayer( ActivationMaker::instance()->linear() );
net->addLayer( SoftMaxMaker::instance() );
qDebug()<<"heelo! 1.5";
net->print();
qDebug()<<"heelo! 2";
// create a Trainer object, currently SGD,
// passing in learning rate, and momentum:
Trainer *trainer = SGD::instance( cl, 0.02f, 0.0f );
int numEpochs = 10;
int batchSize = 10;
QDir pictureLocation;
pictureLocation.cdUp();
pictureLocation.cd("CNN");
//cout<<pictureLocation.absolutePath().toStdString();
//GenericLoaderv2 trainL(QString(pictureLocation.absolutePath()+"/train-images.idx3-ubyte").toStdString());
QImage img(pictureLocation.absolutePath()+"/data/n01440764/n01440764_18.JPEG");
std::vector<qreal> v_data;
float data[img.height()*img.width()*3];
for(int i=0;i<img.width();i++)
{
for(int j=0;j<img.height();j++)
{
qreal r,g,b;
img.pixelColor(i,j).getRgbF(&r,&g,&b);
v_data.push_back(r);
v_data.push_back(g);
v_data.push_back(b);
}
}
int offset=0;
for ( auto it = v_data.rbegin(); it != v_data.rend(); ++it )
data[offset++]=*it;
qDebug()<<img.width()*img.height()<<sizeof(data);
int constLabels[] = {1};
NetLearner netL(trainer,net,1,data,constLabels,1,data,constLabels,187500);
netL.run();
But i've got on backpropagation an exception: ClblasNotInitializedException Then GDB showed me that in toolslib.c file this code
cl_uint numberPlatform;
auto ret = clGetPlatformIDs(0, NULL, &numberPlatform);
if (ret != CL_SUCCESS || numberPlatform == 0) {
return 0;
}
returns SegFault. But outside from library clBLAS this code works fine returning my OpenCL platform.
Please, help me. is this a library issue?
Hi Blacky-i,
auto ret = clGetPlatformIDs(0, NULL, &numberPlatform);
I have also tried to run example code in clBLAS library. i am getting same issue.
Rebuilded clBLAS library from scratch. Also doesn't help.
Notebook ASUS UX32VD. Nvidia 620M. CUDA 8.0
I dont see that line in the sourcecode you provide at https://github.com/hughperkins/DeepCL/issues/94#issuecomment-248586455 ? Alternatively, if you're in gdb, can you type bt
, and paste the entire output please?
I am sorry, I was rushing. When i got clBLASNotInitializedException
I tried to initialize it manually. That is why i mentioned auto ret = clGetPlatformIDs(0, NULL, &numberPlatform);
Backtrace of running sourcecode at #94 (comment):
Thread 1 "CNN" received signal SIGABRT, Aborted.
0x00007ffff55f5418 in __GI_raise (sig=sig@entry=6)
at ../sysdeps/unix/sysv/linux/raise.c:54
54 ../sysdeps/unix/sysv/linux/raise.c: Нет такого файла или каталога.
(gdb) bt
#0 0x00007ffff55f5418 in __GI_raise (sig=sig@entry=6)
at ../sysdeps/unix/sysv/linux/raise.c:54
#1 0x00007ffff55f701a in __GI_abort () at abort.c:89
#2 0x00007ffff5c2e84d in __gnu_cxx::__verbose_terminate_handler() ()
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x00007ffff5c2c6b6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x00007ffff5c2c701 in std::terminate() ()
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x00007ffff5c2c919 in __cxa_throw ()
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x00007ffff7b5a136 in ClBlasHelper::Gemm (cl=0x627720,
order=order@entry=clblasColumnMajor, aTrans=aTrans@entry=clblasNoTrans,
bTrans=bTrans@entry=clblasTrans, m=<optimized out>, k=<optimized out>,
n=150, alpha=alpha@entry=1, AWrapper=0x99be40, aOffset=0,
BWrapper=0x931610, bOffset=0, beta=beta@entry=0, CWrapper=0xc09960,
cOffset=0)
at /home/bitummon/Projects/Qt/DeepCL/src/clblas/ClBlasHelper.cpp:40
#7 0x00007ffff7b69e33 in BackwardIm2Col::backward (this=0xc09e00,
batchSize=1, inputDataWrapper=<optimized out>, gradOutputWrapper=0x99be40,
weightsWrapper=0x931610, gradInputWrapper=0x99b630)
at /home/bitummon/Projects/Qt/DeepCL/src/conv/BackwardIm2Col.cpp:64
#8 0x00007ffff7b69392 in BackwardAuto::backward (this=0x931520,
batchSize=<optimized out>, inputDataWrapper=<optimized out>,
---Type <return> to continue, or q <return> to quit---
gradOutput=<optimized out>, weightsWrapper=<optimized out>,
gradInput=<optimized out>)
at /home/bitummon/Projects/Qt/DeepCL/src/conv/BackwardAuto.cpp:70
#9 0x00007ffff7b72143 in ConvolutionalLayer::backward (this=0x946cf0)
at /home/bitummon/Projects/Qt/DeepCL/src/conv/ConvolutionalLayer.cpp:434
#10 0x00007ffff7b8bd6b in NeuralNet::backward (this=this@entry=0x866b10,
outputData=outputData@entry=0x7fffffdd7f70)
at /home/bitummon/Projects/Qt/DeepCL/src/net/NeuralNet.cpp:233
#11 0x00007ffff7b9b997 in SGD::trainNet (this=0x96ab20, net=0x866b10,
context=<optimized out>, input=0x7fffffdd80c0, outputData=0x7fffffdd7f70)
at /home/bitummon/Projects/Qt/DeepCL/src/trainers/SGD.cpp:81
#12 0x00007ffff7b9bbb8 in SGD::trainNetFromLabels (this=0x96ab20,
net=0x866b10, context=0x7fffffdd8010, input=0x7fffffdd80c0,
labels=<optimized out>)
at /home/bitummon/Projects/Qt/DeepCL/src/trainers/SGD.cpp:108
#13 0x00007ffff7b9c69f in Trainer::trainFromLabels (this=0x96ab20,
trainable=<optimized out>, context=0x7fffffdd8010, input=0x7fffffdd80c0,
labels=0x7fffffffd760)
at /home/bitummon/Projects/Qt/DeepCL/src/trainers/Trainer.cpp:71
#14 0x00007ffff7b602a5 in LearnBatcher::internalTick (this=0x9730a0,
epoch=<optimized out>, batchData=0x7fffffdd80c0,
batchLabels=0x7fffffffd760)
at /home/bitummon/Projects/Qt/DeepCL/src/batch/Batcher.cpp:143
---Type <return> to continue, or q <return> to quit---
#15 0x00007ffff7b603e8 in Batcher::tick (this=0x9730a0, epoch=2)
at /home/bitummon/Projects/Qt/DeepCL/src/batch/Batcher.cpp:101
#16 0x00007ffff7b60cdf in NetLearner::tickBatch (this=0x7fffffffd720)
at /home/bitummon/Projects/Qt/DeepCL/src/batch/NetLearner.cpp:83
#17 0x00007ffff7b60c61 in NetLearner::tickEpoch (this=0x7fffffffd720)
at /home/bitummon/Projects/Qt/DeepCL/src/batch/NetLearner.cpp:126
#18 0x00007ffff7b60be9 in NetLearner::run (this=0x7fffffffd720)
at /home/bitummon/Projects/Qt/DeepCL/src/batch/NetLearner.cpp:135
#19 0x00000000004048ea in network::network (this=0x7fffffffd830, parent=0x0)
at ../CNN/network.cpp:65
#20 0x0000000000403a7a in main () at ../CNN/main.cpp:81
20 0x0000000000403a7a in main () at ../CNN/main.cpp:81
Seems not to have the same number of lines as the earlier source-code? Can you post the exact source code associated with this stack trace please?
My full sourcecode:
main.cpp
#include <QGuiApplication>
#include <QQmlApplicationEngine>
#include "network.h"
int main(int argc, char *argv[])
{
QCoreApplication::setAttribute(Qt::AA_EnableHighDpiScaling);
QGuiApplication app(argc, argv);
cl_uint numberPlatform;
auto ret = clGetPlatformIDs(0, NULL, &numberPlatform);
if (ret != CL_SUCCESS || numberPlatform == 0) {
return 0;
}
//ClBlasInstance();
network();
QQmlApplicationEngine engine;
engine.load(QUrl(QLatin1String("qrc:/main.qml")));
return app.exec();
}
network.h
#ifndef NETWORK_H
#define NETWORK_H
#include<iostream>
#include "imagenetbatchinfo.h"
#include "DeepCL.h"
#include <QObject>
#include <QDir>
#include <QStandardPaths>
#include <QDebug>
#include <QImage>
#include <QRgb>
class network : public QObject
{
Q_OBJECT
public:
explicit network(QObject *parent = 0);
signals:
public slots:
};
#endif // NETWORK_H
network.cpp
#include "network.h"
network::network(QObject *parent) : QObject(parent)
{
qDebug()<<"heelo! 1";
EasyCL *cl = new EasyCL();
qDebug()<<"heelo! 1.1";
NeuralNet *net = new NeuralNet(cl);
qDebug()<<"heelo! 1.2";
net->addLayer( InputLayerMaker::instance()->numPlanes(4)->imageSize(28) );
net->addLayer( NormalizationLayerMaker::instance()->translate( -2 )->scale( 1.0f / 2 ) );
net->addLayer( ConvolutionalMaker::instance()->numFilters(8)->filterSize(5)->biased() );
net->addLayer( ActivationMaker::instance()->relu() );
net->addLayer( PoolingMaker::instance()->poolingSize(2) );
net->addLayer( ConvolutionalMaker::instance()->numFilters(16)->filterSize(5)->biased() );
net->addLayer( ActivationMaker::instance()->relu() );
net->addLayer( PoolingMaker::instance()->poolingSize(3) );
net->addLayer( FullyConnectedMaker::instance()->numPlanes(150)->imageSize(1)->biased() );
net->addLayer( ActivationMaker::instance()->relu() );
net->addLayer( FullyConnectedMaker::instance()->numPlanes(10)->imageSize(1)->biased() );
net->addLayer( ActivationMaker::instance()->linear() );
net->addLayer( SoftMaxMaker::instance() );
qDebug()<<"heelo! 1.5";
net->print();
qDebug()<<"heelo! 2";
// create a Trainer object, currently SGD,
// passing in learning rate, and momentum:
Trainer *trainer = SGD::instance( cl, 0.02f, 0.0f );
int numEpochs = 10;
int batchSize = 10;
QDir pictureLocation;
pictureLocation.cdUp();
pictureLocation.cd("CNN");
//cout<<pictureLocation.absolutePath().toStdString();
//GenericLoaderv2 trainL(QString(pictureLocation.absolutePath()+"/train-images.idx3-ubyte").toStdString());
QImage img(pictureLocation.absolutePath()+"/data/n01440764/n01440764_18.JPEG");
std::vector<qreal> v_data;
float data[img.height()*img.width()*3];
for(int i=0;i<img.width();i++)
{
for(int j=0;j<img.height();j++)
{
qreal r,g,b;
img.pixelColor(i,j).getRgbF(&r,&g,&b);
v_data.push_back(r);
v_data.push_back(g);
v_data.push_back(b);
}
}
int offset=0;
for ( auto it = v_data.rbegin(); it != v_data.rend(); ++it )
data[offset++]=*it;
qDebug()<<img.width()*img.height()<<sizeof(data);
int constLabels[] = {1};
NetLearner netL(trainer,net,1,data,constLabels,1,data,constLabels,187500);
netL.run();
//ImageNetBatchInfo batch(pictureLocation.absolutePath()+"/data/n01440764");
//qDebug()<<img.bitPlaneCount();
//NetLearner(trainer,net,100,trainL.)
// NetLearnerOnDemandv2 netLearner( trainer, net,
// Ntrain, trainData, trainLabels,
// Ntest, testData, testLabels );
// netLearner.setSchedule( numEpochs );
// netLearner.setBatchSize( batchSize );
// netLearner.learn();
// // learning is now done :-)
// // (create a net, as above)
// // (train it, as above)
// // test, eg:
// BatchLearner batchLearner( net );
// int testNumRight = batchLearner.test( batchSize, Ntest, testData, testLabels );
}
backtrace
Thread 1 "CNN" received signal SIGABRT, Aborted.
0x00007ffff58d3418 in __GI_raise (sig=sig@entry=6)
at ../sysdeps/unix/sysv/linux/raise.c:54
54 ../sysdeps/unix/sysv/linux/raise.c: Нет такого файла или каталога.
(gdb) bt
#0 0x00007ffff58d3418 in __GI_raise (sig=sig@entry=6)
at ../sysdeps/unix/sysv/linux/raise.c:54
#1 0x00007ffff58d501a in __GI_abort () at abort.c:89
#2 0x00007ffff5f0c84d in __gnu_cxx::__verbose_terminate_handler() ()
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x00007ffff5f0a6b6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x00007ffff5f0a701 in std::terminate() ()
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x00007ffff5f0a919 in __cxa_throw ()
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x00007ffff7954136 in ClBlasHelper::Gemm (cl=0x669760,
order=order@entry=clblasColumnMajor, aTrans=aTrans@entry=clblasNoTrans,
bTrans=bTrans@entry=clblasTrans, m=<optimized out>, k=<optimized out>,
n=150, alpha=alpha@entry=1, AWrapper=0x9e8140, aOffset=0,
BWrapper=0x97eb90, bOffset=0, beta=beta@entry=0, CWrapper=0xc39b10,
cOffset=0)
at /home/bitummon/Projects/Qt/DeepCL/src/clblas/ClBlasHelper.cpp:40
#7 0x00007ffff7963e33 in BackwardIm2Col::backward (this=0xc4d2c0,
batchSize=1, inputDataWrapper=<optimized out>, gradOutputWrapper=0x9e8140,
weightsWrapper=0x97eb90, gradInputWrapper=0x9e7930)
at /home/bitummon/Projects/Qt/DeepCL/src/conv/BackwardIm2Col.cpp:64
#8 0x00007ffff7963392 in BackwardAuto::backward (this=0x97eaa0,
batchSize=<optimized out>, inputDataWrapper=<optimized out>,
---Type <return> to continue, or q <return> to quit---
gradOutput=<optimized out>, weightsWrapper=<optimized out>,
gradInput=<optimized out>)
at /home/bitummon/Projects/Qt/DeepCL/src/conv/BackwardAuto.cpp:70
#9 0x00007ffff796c143 in ConvolutionalLayer::backward (this=0x994270)
at /home/bitummon/Projects/Qt/DeepCL/src/conv/ConvolutionalLayer.cpp:434
#10 0x00007ffff7985d6b in NeuralNet::backward (this=this@entry=0x8b0510,
outputData=outputData@entry=0x7fffffdd7fe0)
at /home/bitummon/Projects/Qt/DeepCL/src/net/NeuralNet.cpp:233
#11 0x00007ffff7995997 in SGD::trainNet (this=0x9b80a0, net=0x8b0510,
context=<optimized out>, input=0x7fffffdd8130, outputData=0x7fffffdd7fe0)
at /home/bitummon/Projects/Qt/DeepCL/src/trainers/SGD.cpp:81
#12 0x00007ffff7995bb8 in SGD::trainNetFromLabels (this=0x9b80a0,
net=0x8b0510, context=0x7fffffdd8080, input=0x7fffffdd8130,
labels=<optimized out>)
at /home/bitummon/Projects/Qt/DeepCL/src/trainers/SGD.cpp:108
#13 0x00007ffff799669f in Trainer::trainFromLabels (this=0x9b80a0,
trainable=<optimized out>, context=0x7fffffdd8080, input=0x7fffffdd8130,
labels=0x7fffffffd7d0)
at /home/bitummon/Projects/Qt/DeepCL/src/trainers/Trainer.cpp:71
#14 0x00007ffff795a2a5 in LearnBatcher::internalTick (this=0x9c0260,
epoch=<optimized out>, batchData=0x7fffffdd8130,
batchLabels=0x7fffffffd7d0)
at /home/bitummon/Projects/Qt/DeepCL/src/batch/Batcher.cpp:143
---Type <return> to continue, or q <return> to quit---
#15 0x00007ffff795a3e8 in Batcher::tick (this=0x9c0260, epoch=2)
at /home/bitummon/Projects/Qt/DeepCL/src/batch/Batcher.cpp:101
#16 0x00007ffff795acdf in NetLearner::tickBatch (this=0x7fffffffd790)
at /home/bitummon/Projects/Qt/DeepCL/src/batch/NetLearner.cpp:83
#17 0x00007ffff795ac61 in NetLearner::tickEpoch (this=0x7fffffffd790)
at /home/bitummon/Projects/Qt/DeepCL/src/batch/NetLearner.cpp:126
#18 0x00007ffff795abe9 in NetLearner::run (this=0x7fffffffd790)
at /home/bitummon/Projects/Qt/DeepCL/src/batch/NetLearner.cpp:135
#19 0x00000000004047c6 in network::network (this=0x7fffffffd880, parent=0x0)
at ../CNN/network.cpp:65
#20 0x0000000000403a6d in main (argc=1, argv=0x7fffffffd9a8)
at ../CNN/main.cpp:21
gdb output
heelo! 1.1
heelo! 1.2
heelo! 1.5
layer 0:InputLayer{ outputPlanes=4 outputSize=28 }
layer 1:NormalizationLayer{ outputPlanes=4 outputSize=28 translate=-2 scale=0.5 }
layer 2:ConvolutionalLayer{ LayerDimensions{ inputPlanes=4 inputSize=28 numFilters=8 filterSize=5 outputSize=24 padZeros=0 biased=1 skip=0} }
layer 3:ActivationLayer{ RELU }
layer 4:PoolingLayer{ inputPlanes=8 inputSize=24 poolingSize=2 }
layer 5:ConvolutionalLayer{ LayerDimensions{ inputPlanes=8 inputSize=12 numFilters=16 filterSize=5 outputSize=8 padZeros=0 biased=1 skip=0} }
layer 6:ActivationLayer{ RELU }
layer 7:PoolingLayer{ inputPlanes=16 inputSize=8 poolingSize=3 }
layer 8:FullyConnectedLayer{ numPlanes=150 imageSize=1 }
layer 9:ActivationLayer{ RELU }
layer 10:FullyConnectedLayer{ numPlanes=10 imageSize=1 }
layer 11:ActivationLayer{ LINEAR }
layer 12:SoftMaxLayer{ perPlane=0 numPlanes=10 imageSize=1 }
Parameters overview: (skipping 8 layers with 0 params)
layer 1: params=2 0.0%
layer 2: params=808 5.3%
layer 5: params=3216 21.0%
layer 8: params=9750 63.8%
layer 10: params=1510 9.9%
TOTAL : params=15286
heelo! 2
187500 2250000
statefultimer v0.7
forward try kernel 0
... not plausibly optimal, skipping
forward try kernel 1
... seems valid
ForwardAuto: kernel 1 0ms
forward try kernel 0
... not plausibly optimal, skipping
forward try kernel 1
... seems valid
ForwardAuto: kernel 1 0ms
forward try kernel 0
... not plausibly optimal, skipping
forward try kernel 1
... seems valid
ForwardAuto: kernel 1 0ms
forward try kernel 0
... not plausibly optimal, skipping
forward try kernel 1
... seems valid
ForwardAuto: kernel 1 0ms
backward try kernel 0
... not plausibly optimal, skipping
backward try kernel 1
... seems valid
BackwardAuto: kernel 1 0ms
calcGradWeights try kernel 0
... not plausibly optimal, skipping
calcGradWeights try kernel 1
... seems valid
BackpropWeightsAuto: kernel 1 0ms
backward try kernel 0
... not plausibly optimal, skipping
backward try kernel 1
... seems valid
BackwardAuto: kernel 1 0ms
calcGradWeights try kernel 0
... not plausibly optimal, skipping
calcGradWeights try kernel 1
... seems valid
BackpropWeightsAuto: kernel 1 0ms
backward try kernel 0
... not plausibly optimal, skipping
backward try kernel 1
... seems valid
BackwardAuto: kernel 1 0ms
calcGradWeights try kernel 0
... not plausibly optimal, skipping
calcGradWeights try kernel 1
... seems valid
BackpropWeightsAuto: kernel 1 0ms
calcGradWeights try kernel 0
... not plausibly optimal, skipping
calcGradWeights try kernel 1
... seems valid
BackpropWeightsAuto: kernel 1 0ms
after epoch 1 7 ms
training loss: 2.69564
train accuracy: 0/1 0%
forward try kernel 2
... seems valid
ForwardAuto: kernel 2 0ms
forward try kernel 2
... seems valid
ForwardAuto: kernel 2 0ms
forward try kernel 2
... seems valid
ForwardAuto: kernel 2 0ms
forward try kernel 2
... seems valid
ForwardAuto: kernel 2 0ms
test accuracy: 1/1 100%
after tests 1 ms
forward try kernel 3
... seems valid
ForwardAuto: kernel 3 0ms
forward try kernel 3
... seems valid
ForwardAuto: kernel 3 0ms
forward try kernel 3
... seems valid
ForwardAuto: kernel 3 0ms
forward try kernel 3
... seems valid
ForwardAuto: kernel 3 0ms
backward try kernel 2
... seems valid
BackwardAuto: kernel 2 0ms
calcGradWeights try kernel 2
... seems valid
BackpropWeightsAuto: kernel 2 0ms
backward try kernel 2
... seems valid
BackwardAuto: kernel 2 0ms
calcGradWeights try kernel 2
... seems valid
BackpropWeightsAuto: kernel 2 0ms
backward try kernel 2
... seems valid
BackwardAuto: kernel 2 0ms
calcGradWeights try kernel 2
... seems valid
BackpropWeightsAuto: kernel 2 0ms
calcGradWeights try kernel 2
... seems valid
BackpropWeightsAuto: kernel 2 0ms
after epoch 2 7 ms
training loss: 1.79052
train accuracy: 1/1 100%
forward try kernel 4
... seems valid
ForwardAuto: kernel 4 0ms
forward try kernel 4
... seems valid
ForwardAuto: kernel 4 0ms
forward try kernel 4
... seems valid
ForwardAuto: kernel 4 0ms
forward try kernel 4
... seems valid
ForwardAuto: kernel 4 0ms
test accuracy: 1/1 100%
after tests 1 ms
forward try kernel 5
ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
... not valid
forward try kernel 6
... seems valid
ForwardAuto: kernel 6 0ms
forward try kernel 5
ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
... not valid
forward try kernel 6
... seems valid
ForwardAuto: kernel 6 0ms
forward try kernel 5
... seems valid
ForwardAuto: kernel 5 0ms
forward try kernel 5
... seems valid
ForwardAuto: kernel 5 0ms
backward try kernel 3
... seems valid
Didnt initialize clBLAS
terminate called after throwing an instance of 'ClblasNotInitializedException'
Thanks. in main.cpp
, yo ucurrently have the QQ
includes before the DeepCL ones:
#include <QGuiApplication>
#include <QQmlApplicationEngine>
#include "network.h"
Can you reverse the order please?
#include "network.h"
#include <QGuiApplication>
#include <QQmlApplicationEngine>
Also, I'm not sure what is in imagenetbatchinfo.h, but can you put it after the deepcl includes, in network.h, please? :
#ifndef NETWORK_H
#define NETWORK_H
#include "DeepCL.h"
#include<iostream>
#include "imagenetbatchinfo.h"
Also, can you uncomment the clBlasInstance, so I get the stack trace at hte point of the segfault?
i've reversed the order as you said. doesn't change.
backtrace with uncommented ClBlasInstance():
#0 0x0000000000000000 in ?? ()
#1 0x00007ffff500202d in getPlatforms (
platforms=platforms@entry=0x7fffffffd798)
at /home/bitummon/Projects/Qt/DeepCL/clMathLibraries/clBLAS/src/library/tools/tune/toolslib.c:449
#2 0x00007ffff50020cd in initStorageCache ()
at /home/bitummon/Projects/Qt/DeepCL/clMathLibraries/clBLAS/src/library/tools/tune/toolslib.c:482
#3 0x00007ffff4faf61b in clblasSetup ()
at /home/bitummon/Projects/Qt/DeepCL/clMathLibraries/clBLAS/src/library/blas/init.c:212
#4 0x0000000000403b18 in main (argc=1, argv=0x7fffffffd9a8)
at ../CNN/main.cpp:22
gdb output:
Starting program: /home/bitummon/Projects/Qt/build-CNN-Desktop_Qt_5_7_0_GCC_64bit-Debug/CNN
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
QML debugging is enabled. Only use this in a safe environment.
[New Thread 0x7fffecbc1700 (LWP 5560)]
[New Thread 0x7fffe77b6700 (LWP 5561)]
Thread 1 "CNN" received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
also adding clinfo output:
clinfo: /usr/local/cuda/lib64/libOpenCL.so.1: no version information available (required by clinfo)
Number of platforms 1
Platform Name NVIDIA CUDA
Platform Vendor NVIDIA Corporation
Platform Version OpenCL 1.2 CUDA 8.0.20
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts
Platform Extensions function suffix NV
Platform Name NVIDIA CUDA
Number of devices 1
Device Name GeForce GT 620M
Device Vendor NVIDIA Corporation
Device Vendor ID 0x10de
Device Version OpenCL 1.1 CUDA
Driver Version 361.42
Device OpenCL C Version OpenCL C 1.1
Device Type GPU
Device Profile FULL_PROFILE
Device Topology (NV) PCI-E, 01:00.0
Max compute units 2
Max clock frequency 1250MHz
Compute Capability (NV) 2.1
Max work item dimensions 3
Max work item sizes 1024x1024x64
Max work group size 1024
Preferred work group size multiple 32
Warp size (NV) 32
Preferred / native vector sizes
char 1 / 1
short 1 / 1
int 1 / 1
long 1 / 1
half 0 / 0 (n/a)
float 1 / 1
double 1 / 1 (cl_khr_fp64)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Address bits 64, Little-Endian
Global memory size 1073479680 (1024MiB)
Error Correction support No
Max memory allocation 268369920 (255.9MiB)
Unified memory for Host and Device No
Integrated memory (NV) No
Minimum alignment for any data type 128 bytes
Alignment of base address 4096 bits (512 bytes)
Global Memory cache type Read/Write
Global Memory cache size 32768
Global Memory cache line 128 bytes
Image support Yes
Max number of samplers per kernel 16
Max 2D image size 16384x16384 pixels
Max 3D image size 2048x2048x2048 pixels
Max number of read image args 128
Max number of write image args 8
Local memory type Local
Local memory size 49152 (48KiB)
Registers per block (NV) 32768
Max constant buffer size 65536 (64KiB)
Max number of constant args 9
Max size of kernel argument 4352 (4.25KiB)
Queue properties
Out-of-order execution Yes
Profiling Yes
Profiling timer resolution 1000ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Kernel execution timeout (NV) Yes
Concurrent copy and kernel execution (NV) Yes
Number of async copy engines 1
Device Available Yes
Compiler Available Yes
Device Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform
clCreateContext(NULL, ...) [default] No platform
clCreateContext(NULL, ...) [other] Success [NV]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) No platform
Ok. What happens if you put this at the start of your main
function? :
bool clpresent = 0 == clewInit();
if(!clpresent) {
throw std::runtime_error("OpenCL library not found");
}
main.cpp:
#include <clew.h>
#include "network.h"
#include <QGuiApplication>
#include <QQmlApplicationEngine>
int main(int argc, char *argv[])
{
QCoreApplication::setAttribute(Qt::AA_EnableHighDpiScaling);
QGuiApplication app(argc, argv);
bool clpresent = 0 == clewInit();
if(!clpresent) {
throw std::runtime_error("OpenCL library not found");
}
cl_uint numberPlatform;
auto ret = clGetPlatformIDs(0, NULL, &numberPlatform);
if (ret != CL_SUCCESS || numberPlatform == 0) {
return 0;
}
ClBlasInstance();
network();
QQmlApplicationEngine engine;
engine.load(QUrl(QLatin1String("qrc:/main.qml")));
return app.exec();
}
clpresent=true, numberPlatform = 1, ret =0.
ClBlasInstance() now works, but error at netLearner again.
backtrace:
#0 0x00007ffff58d3418 in __GI_raise (sig=sig@entry=6)
at ../sysdeps/unix/sysv/linux/raise.c:54
#1 0x00007ffff58d501a in __GI_abort () at abort.c:89
#2 0x00007ffff5f0c84d in __gnu_cxx::__verbose_terminate_handler() ()
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x00007ffff5f0a6b6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x00007ffff5f0a701 in std::terminate() ()
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x00007ffff5f0a919 in __cxa_throw ()
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x00007ffff7b5a136 in ClBlasHelper::Gemm (cl=0x6843a0,
order=order@entry=clblasColumnMajor, aTrans=aTrans@entry=clblasNoTrans,
bTrans=bTrans@entry=clblasTrans, m=<optimized out>, k=<optimized out>,
n=150, alpha=alpha@entry=1, AWrapper=0x9e8460, aOffset=0,
BWrapper=0x97eeb0, bOffset=0, beta=beta@entry=0, CWrapper=0xc5ae80,
cOffset=0)
at /home/bitummon/Projects/Qt/DeepCL/src/clblas/ClBlasHelper.cpp:40
#7 0x00007ffff7b69e33 in BackwardIm2Col::backward (this=0xc4b070,
batchSize=1, inputDataWrapper=<optimized out>, gradOutputWrapper=0x9e8460,
weightsWrapper=0x97eeb0, gradInputWrapper=0x9e7c50)
at /home/bitummon/Projects/Qt/DeepCL/src/conv/BackwardIm2Col.cpp:64
#8 0x00007ffff7b69392 in BackwardAuto::backward (this=0x97edc0,
batchSize=<optimized out>, inputDataWrapper=<optimized out>,
---Type <return> to continue, or q <return> to quit---
gradOutput=<optimized out>, weightsWrapper=<optimized out>,
gradInput=<optimized out>)
at /home/bitummon/Projects/Qt/DeepCL/src/conv/BackwardAuto.cpp:70
#9 0x00007ffff7b72143 in ConvolutionalLayer::backward (this=0x994590)
at /home/bitummon/Projects/Qt/DeepCL/src/conv/ConvolutionalLayer.cpp:434
#10 0x00007ffff7b8bd6b in NeuralNet::backward (this=this@entry=0x8b0830,
outputData=outputData@entry=0x7fffffdd7fe0)
at /home/bitummon/Projects/Qt/DeepCL/src/net/NeuralNet.cpp:233
#11 0x00007ffff7b9b997 in SGD::trainNet (this=0x9b83c0, net=0x8b0830,
context=<optimized out>, input=0x7fffffdd8130, outputData=0x7fffffdd7fe0)
at /home/bitummon/Projects/Qt/DeepCL/src/trainers/SGD.cpp:81
#12 0x00007ffff7b9bbb8 in SGD::trainNetFromLabels (this=0x9b83c0,
net=0x8b0830, context=0x7fffffdd8080, input=0x7fffffdd8130,
labels=<optimized out>)
at /home/bitummon/Projects/Qt/DeepCL/src/trainers/SGD.cpp:108
#13 0x00007ffff7b9c69f in Trainer::trainFromLabels (this=0x9b83c0,
trainable=<optimized out>, context=0x7fffffdd8080, input=0x7fffffdd8130,
labels=0x7fffffffd7d0)
at /home/bitummon/Projects/Qt/DeepCL/src/trainers/Trainer.cpp:71
#14 0x00007ffff7b602a5 in LearnBatcher::internalTick (this=0x9c0580,
epoch=<optimized out>, batchData=0x7fffffdd8130,
batchLabels=0x7fffffffd7d0)
at /home/bitummon/Projects/Qt/DeepCL/src/batch/Batcher.cpp:143
---Type <return> to continue, or q <return> to quit---
#15 0x00007ffff7b603e8 in Batcher::tick (this=0x9c0580, epoch=2)
at /home/bitummon/Projects/Qt/DeepCL/src/batch/Batcher.cpp:101
#16 0x00007ffff7b60cdf in NetLearner::tickBatch (this=0x7fffffffd790)
at /home/bitummon/Projects/Qt/DeepCL/src/batch/NetLearner.cpp:83
#17 0x00007ffff7b60c61 in NetLearner::tickEpoch (this=0x7fffffffd790)
at /home/bitummon/Projects/Qt/DeepCL/src/batch/NetLearner.cpp:126
#18 0x00007ffff7b60be9 in NetLearner::run (this=0x7fffffffd790)
at /home/bitummon/Projects/Qt/DeepCL/src/batch/NetLearner.cpp:135
#19 0x0000000000404930 in network::network (this=0x7fffffffd880, parent=0x0)
at ../CNN/network.cpp:65
#20 0x0000000000403bc8 in main (argc=1, argv=0x7fffffffd9a8)
at ../CNN/main.cpp:29
And now i have strange behavior, if i remove lines you mentioned at https://github.com/hughperkins/DeepCL/issues/94#issuecomment-249112023 , i got segfault at line in main.cpp: auto ret = clGetPlatformIDs(0, NULL, &numberPlatform);
But if I remove #include <clew.h>
they are working fine
You also need to create an OpenCL context, before instantiating a ClBlasInstance instance;
. By the way ClBlasInstance is not really a method, it's an object. Can you put the following at the start of your main, and remove the call to ClBlasInstance?
EasyCL *cl = 0;
if(config.gpuIndex >= 0) {
cl = EasyCL::createForIndexedGpu(config.gpuIndex);
} else {
cl = EasyCL::createForFirstGpuOtherwiseCpu();
}
ClBlasInstance blasInstance;
(and pass the cl
object into your network()
function too please)
Ok, but what is config parameter? compiler do not know what it is.
I've added this,
EasyCL *cl = 0;
cl = EasyCL::createForFirstGpuOtherwiseCpu();
ClBlasInstance blasInstance;
and everything works fine!
No crashes at all. So to sum up, for some reason linker cannot find properly opencl library?
Ok, so there are a few things:
clewInit()
, to pull in the opencl library at runtime(oh, and finally CLblas library expects an opencl context to have been created, before you initilie it, I'm fairly sure, I cant quite remember how it finds the context though; my understanding is a bit hazy on this last point).
Ok, thanks for help very much!
Cool :-)
Hello! I got a segfault on ClBlasInstance();
on this piece of code: ` cl_int ret; cl_uint numberPlatform;
ret = clGetPlatformIDs(0, NULL, &numberPlatform);
- crashes.This piece of code is working fine outside of library, but in lib it is crashes.
OpenCL installed. CUDA installed. Ubuntu 16.04.