hughperkins / DeepCL

OpenCL library to train deep convolutional neural networks
Mozilla Public License 2.0
866 stars 200 forks source link

Seg Fault on ClBlasInstance(); #94

Closed blacky-i closed 8 years ago

blacky-i commented 8 years ago

Hello! I got a segfault on ClBlasInstance();

on this piece of code: ` cl_int ret; cl_uint numberPlatform;

ret = clGetPlatformIDs(0, NULL, &numberPlatform);

if (ret != CL_SUCCESS || numberPlatform == 0) {
    return  0;
} `

ret = clGetPlatformIDs(0, NULL, &numberPlatform); - crashes.

This piece of code is working fine outside of library, but in lib it is crashes.

OpenCL installed. CUDA installed. Ubuntu 16.04.

hughperkins commented 8 years ago

Can you provide:

I guess it might be because you need to call to clewInit(), https://github.com/hughperkins/EasyCL/blob/master/EasyCL.cpp#L51-L56 :

    #ifdef USE_CLEW
        bool clpresent = 0 == clewInit();
        if(!clpresent) {
            throw std::runtime_error("OpenCL library not found");
        }
    #endif
blacky-i commented 8 years ago

Thanks for reply! Here my piece of code

#include <QGuiApplication>
#include <QQmlApplicationEngine>
#include "network.h"
int main(int argc, char *argv[])
{
    QCoreApplication::setAttribute(Qt::AA_EnableHighDpiScaling);
    QGuiApplication app(argc, argv);

//this piece of code returning my OpenCL Platform. clBlasSetup() causes segmantation fault.    
    cl_uint numberPlatform;

     auto ret = clGetPlatformIDs(0, NULL, &numberPlatform);
     if (ret != CL_SUCCESS || numberPlatform == 0) {
         return  0;
     }

    ClBlasInstance();

    QQmlApplicationEngine engine;
    engine.load(QUrl(QLatin1String("qrc:/main.qml")));

    return app.exec();
}

auto ret = clGetPlatformIDs(0, NULL, &numberPlatform); - this line where program stops with segmentation fault in toolslib.c. In main function works fine. Only in clblasSetup().

Here is call stack

1 ??                                             
2 getPlatforms     toolslib.c 449 0x7ffff542202d 
3 initStorageCache toolslib.c 482 0x7ffff54220cd 
4 clblasSetup      init.c     212 0x7ffff53cf61b 
5 main             main.cpp   18  0x403b18       
hughperkins commented 8 years ago

I dont have any visibility into what your program is doing. Can you try to simplify your example so it doesnt bring in any of your own code please? Also, seems to be missing the include file for ClBlasInstance() ?

From the code you have posted, it looks like you havent created an OpenCL context. I dont quite remember how clblasSetup() interacts with opencl, but I reckon it probably needs a context.

blacky-i commented 8 years ago

Ok, i was investigating how DeepCL works and tried to launch your example.

    #include <QObject>
    #include <QDir>
    #include <QStandardPaths>
    #include <QDebug>
    #include <QImage>
    #include <QRgb>

    #include<iostream> 
    #include "imagenetbatchinfo.h"
    #include "DeepCL.h"

    EasyCL *cl = new EasyCL();
    NeuralNet *net = new NeuralNet(cl);
    net->addLayer( InputLayerMaker::instance()->numPlanes(4)->imageSize(28) );
    net->addLayer( NormalizationLayerMaker::instance()->translate( -2 )->scale( 1.0f / 2 ) );
    net->addLayer( ConvolutionalMaker::instance()->numFilters(8)->filterSize(5)->biased() );
    net->addLayer( ActivationMaker::instance()->relu() );
    net->addLayer( PoolingMaker::instance()->poolingSize(2) );
    net->addLayer( ConvolutionalMaker::instance()->numFilters(16)->filterSize(5)->biased() );
    net->addLayer( ActivationMaker::instance()->relu() );
    net->addLayer( PoolingMaker::instance()->poolingSize(3) );
    net->addLayer( FullyConnectedMaker::instance()->numPlanes(150)->imageSize(1)->biased() );
    net->addLayer( ActivationMaker::instance()->relu() );
    net->addLayer( FullyConnectedMaker::instance()->numPlanes(10)->imageSize(1)->biased() );
    net->addLayer( ActivationMaker::instance()->linear() );
    net->addLayer( SoftMaxMaker::instance() );
    qDebug()<<"heelo!  1.5";
    net->print();
    qDebug()<<"heelo! 2";

    // create a Trainer object, currently SGD,
    // passing in learning rate, and momentum:
    Trainer *trainer = SGD::instance( cl, 0.02f, 0.0f );
    int numEpochs = 10;
    int batchSize = 10;
    QDir pictureLocation;
    pictureLocation.cdUp();
    pictureLocation.cd("CNN");

    //cout<<pictureLocation.absolutePath().toStdString();
    //GenericLoaderv2 trainL(QString(pictureLocation.absolutePath()+"/train-images.idx3-ubyte").toStdString());

    QImage img(pictureLocation.absolutePath()+"/data/n01440764/n01440764_18.JPEG");
    std::vector<qreal> v_data;
    float data[img.height()*img.width()*3];

    for(int i=0;i<img.width();i++)
    {
    for(int j=0;j<img.height();j++)
    {
        qreal r,g,b;
        img.pixelColor(i,j).getRgbF(&r,&g,&b);
        v_data.push_back(r);
        v_data.push_back(g);
        v_data.push_back(b);
    }
    }

    int offset=0;
    for ( auto it = v_data.rbegin(); it != v_data.rend(); ++it )
    data[offset++]=*it;
    qDebug()<<img.width()*img.height()<<sizeof(data);

       int constLabels[] = {1};

        NetLearner netL(trainer,net,1,data,constLabels,1,data,constLabels,187500);
    netL.run();

But i've got on backpropagation an exception: ClblasNotInitializedException Then GDB showed me that in toolslib.c file this code

 cl_uint numberPlatform;

     auto ret = clGetPlatformIDs(0, NULL, &numberPlatform);
     if (ret != CL_SUCCESS || numberPlatform == 0) {
         return  0;
     }

returns SegFault. But outside from library clBLAS this code works fine returning my OpenCL platform.

Please, help me. is this a library issue?

hughperkins commented 8 years ago

Hi Blacky-i,

blacky-i commented 8 years ago

I have also tried to run example code in clBLAS library. i am getting same issue.

Rebuilded clBLAS library from scratch. Also doesn't help.

Notebook ASUS UX32VD. Nvidia 620M. CUDA 8.0

hughperkins commented 8 years ago

I dont see that line in the sourcecode you provide at https://github.com/hughperkins/DeepCL/issues/94#issuecomment-248586455 ? Alternatively, if you're in gdb, can you type bt, and paste the entire output please?

blacky-i commented 8 years ago

I am sorry, I was rushing. When i got clBLASNotInitializedException I tried to initialize it manually. That is why i mentioned auto ret = clGetPlatformIDs(0, NULL, &numberPlatform);

Backtrace of running sourcecode at #94 (comment):

Thread 1 "CNN" received signal SIGABRT, Aborted.
0x00007ffff55f5418 in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/unix/sysv/linux/raise.c:54
54  ../sysdeps/unix/sysv/linux/raise.c: Нет такого файла или каталога.
(gdb) bt
#0  0x00007ffff55f5418 in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007ffff55f701a in __GI_abort () at abort.c:89
#2  0x00007ffff5c2e84d in __gnu_cxx::__verbose_terminate_handler() ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007ffff5c2c6b6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007ffff5c2c701 in std::terminate() ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007ffff5c2c919 in __cxa_throw ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007ffff7b5a136 in ClBlasHelper::Gemm (cl=0x627720, 
    order=order@entry=clblasColumnMajor, aTrans=aTrans@entry=clblasNoTrans, 
    bTrans=bTrans@entry=clblasTrans, m=<optimized out>, k=<optimized out>, 
    n=150, alpha=alpha@entry=1, AWrapper=0x99be40, aOffset=0, 
    BWrapper=0x931610, bOffset=0, beta=beta@entry=0, CWrapper=0xc09960, 
    cOffset=0)
    at /home/bitummon/Projects/Qt/DeepCL/src/clblas/ClBlasHelper.cpp:40
#7  0x00007ffff7b69e33 in BackwardIm2Col::backward (this=0xc09e00, 
    batchSize=1, inputDataWrapper=<optimized out>, gradOutputWrapper=0x99be40, 
    weightsWrapper=0x931610, gradInputWrapper=0x99b630)
    at /home/bitummon/Projects/Qt/DeepCL/src/conv/BackwardIm2Col.cpp:64
#8  0x00007ffff7b69392 in BackwardAuto::backward (this=0x931520, 
    batchSize=<optimized out>, inputDataWrapper=<optimized out>, 
---Type <return> to continue, or q <return> to quit---
    gradOutput=<optimized out>, weightsWrapper=<optimized out>, 
    gradInput=<optimized out>)
    at /home/bitummon/Projects/Qt/DeepCL/src/conv/BackwardAuto.cpp:70
#9  0x00007ffff7b72143 in ConvolutionalLayer::backward (this=0x946cf0)
    at /home/bitummon/Projects/Qt/DeepCL/src/conv/ConvolutionalLayer.cpp:434
#10 0x00007ffff7b8bd6b in NeuralNet::backward (this=this@entry=0x866b10, 
    outputData=outputData@entry=0x7fffffdd7f70)
    at /home/bitummon/Projects/Qt/DeepCL/src/net/NeuralNet.cpp:233
#11 0x00007ffff7b9b997 in SGD::trainNet (this=0x96ab20, net=0x866b10, 
    context=<optimized out>, input=0x7fffffdd80c0, outputData=0x7fffffdd7f70)
    at /home/bitummon/Projects/Qt/DeepCL/src/trainers/SGD.cpp:81
#12 0x00007ffff7b9bbb8 in SGD::trainNetFromLabels (this=0x96ab20, 
    net=0x866b10, context=0x7fffffdd8010, input=0x7fffffdd80c0, 
    labels=<optimized out>)
    at /home/bitummon/Projects/Qt/DeepCL/src/trainers/SGD.cpp:108
#13 0x00007ffff7b9c69f in Trainer::trainFromLabels (this=0x96ab20, 
    trainable=<optimized out>, context=0x7fffffdd8010, input=0x7fffffdd80c0, 
    labels=0x7fffffffd760)
    at /home/bitummon/Projects/Qt/DeepCL/src/trainers/Trainer.cpp:71
#14 0x00007ffff7b602a5 in LearnBatcher::internalTick (this=0x9730a0, 
    epoch=<optimized out>, batchData=0x7fffffdd80c0, 
    batchLabels=0x7fffffffd760)
    at /home/bitummon/Projects/Qt/DeepCL/src/batch/Batcher.cpp:143
---Type <return> to continue, or q <return> to quit---
#15 0x00007ffff7b603e8 in Batcher::tick (this=0x9730a0, epoch=2)
    at /home/bitummon/Projects/Qt/DeepCL/src/batch/Batcher.cpp:101
#16 0x00007ffff7b60cdf in NetLearner::tickBatch (this=0x7fffffffd720)
    at /home/bitummon/Projects/Qt/DeepCL/src/batch/NetLearner.cpp:83
#17 0x00007ffff7b60c61 in NetLearner::tickEpoch (this=0x7fffffffd720)
    at /home/bitummon/Projects/Qt/DeepCL/src/batch/NetLearner.cpp:126
#18 0x00007ffff7b60be9 in NetLearner::run (this=0x7fffffffd720)
    at /home/bitummon/Projects/Qt/DeepCL/src/batch/NetLearner.cpp:135
#19 0x00000000004048ea in network::network (this=0x7fffffffd830, parent=0x0)
    at ../CNN/network.cpp:65
#20 0x0000000000403a7a in main () at ../CNN/main.cpp:81
hughperkins commented 8 years ago

20 0x0000000000403a7a in main () at ../CNN/main.cpp:81

Seems not to have the same number of lines as the earlier source-code? Can you post the exact source code associated with this stack trace please?

blacky-i commented 8 years ago

My full sourcecode:

main.cpp

#include <QGuiApplication>
#include <QQmlApplicationEngine>

#include "network.h"
int main(int argc, char *argv[])
{

    QCoreApplication::setAttribute(Qt::AA_EnableHighDpiScaling);
    QGuiApplication app(argc, argv);

    cl_uint numberPlatform;

     auto ret = clGetPlatformIDs(0, NULL, &numberPlatform);
     if (ret != CL_SUCCESS || numberPlatform == 0) {
         return  0;
     }

    //ClBlasInstance();
    network();

    QQmlApplicationEngine engine;
    engine.load(QUrl(QLatin1String("qrc:/main.qml")));

    return app.exec();
}

network.h

#ifndef NETWORK_H
#define NETWORK_H
#include<iostream>
#include "imagenetbatchinfo.h"
#include "DeepCL.h"

#include <QObject>
#include <QDir>
#include <QStandardPaths>
#include <QDebug>
#include <QImage>
#include <QRgb>

class network : public QObject
{
    Q_OBJECT
public:
    explicit network(QObject *parent = 0);

signals:

public slots:
};

#endif // NETWORK_H

network.cpp

#include "network.h"

network::network(QObject *parent) : QObject(parent)
{

    qDebug()<<"heelo! 1";
    EasyCL *cl = new EasyCL();
    qDebug()<<"heelo! 1.1";
    NeuralNet *net = new NeuralNet(cl);
    qDebug()<<"heelo! 1.2";
    net->addLayer( InputLayerMaker::instance()->numPlanes(4)->imageSize(28) );
    net->addLayer( NormalizationLayerMaker::instance()->translate( -2 )->scale( 1.0f / 2 ) );
    net->addLayer( ConvolutionalMaker::instance()->numFilters(8)->filterSize(5)->biased() );
    net->addLayer( ActivationMaker::instance()->relu() );
    net->addLayer( PoolingMaker::instance()->poolingSize(2) );
    net->addLayer( ConvolutionalMaker::instance()->numFilters(16)->filterSize(5)->biased() );
    net->addLayer( ActivationMaker::instance()->relu() );
    net->addLayer( PoolingMaker::instance()->poolingSize(3) );
    net->addLayer( FullyConnectedMaker::instance()->numPlanes(150)->imageSize(1)->biased() );
    net->addLayer( ActivationMaker::instance()->relu() );
    net->addLayer( FullyConnectedMaker::instance()->numPlanes(10)->imageSize(1)->biased() );
    net->addLayer( ActivationMaker::instance()->linear() );
    net->addLayer( SoftMaxMaker::instance() );
    qDebug()<<"heelo!  1.5";
    net->print();
    qDebug()<<"heelo! 2";

    // create a Trainer object, currently SGD,
    // passing in learning rate, and momentum:
    Trainer *trainer = SGD::instance( cl, 0.02f, 0.0f );
    int numEpochs = 10;
    int batchSize = 10;
    QDir pictureLocation;
    pictureLocation.cdUp();
    pictureLocation.cd("CNN");

//cout<<pictureLocation.absolutePath().toStdString();
//GenericLoaderv2 trainL(QString(pictureLocation.absolutePath()+"/train-images.idx3-ubyte").toStdString());

QImage img(pictureLocation.absolutePath()+"/data/n01440764/n01440764_18.JPEG");
std::vector<qreal> v_data;
float data[img.height()*img.width()*3];

for(int i=0;i<img.width();i++)
{
    for(int j=0;j<img.height();j++)
    {
        qreal r,g,b;
        img.pixelColor(i,j).getRgbF(&r,&g,&b);
        v_data.push_back(r);
        v_data.push_back(g);
        v_data.push_back(b);
    }
}

int offset=0;
for ( auto it = v_data.rbegin(); it != v_data.rend(); ++it )
    data[offset++]=*it;
    qDebug()<<img.width()*img.height()<<sizeof(data);

       int constLabels[] = {1};

        NetLearner netL(trainer,net,1,data,constLabels,1,data,constLabels,187500);
netL.run();

//ImageNetBatchInfo batch(pictureLocation.absolutePath()+"/data/n01440764");

//qDebug()<<img.bitPlaneCount();
//NetLearner(trainer,net,100,trainL.)
//    NetLearnerOnDemandv2 netLearner(        trainer, net,
//        Ntrain, trainData, trainLabels,
//        Ntest, testData, testLabels );
//    netLearner.setSchedule( numEpochs );
//    netLearner.setBatchSize( batchSize );
//    netLearner.learn();
//    // learning is now done :-)
//    // (create a net, as above)
//    // (train it, as above)
//    // test, eg:
//    BatchLearner batchLearner( net );
//    int testNumRight = batchLearner.test( batchSize, Ntest, testData, testLabels );
}

backtrace

Thread 1 "CNN" received signal SIGABRT, Aborted.
0x00007ffff58d3418 in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/unix/sysv/linux/raise.c:54
54  ../sysdeps/unix/sysv/linux/raise.c: Нет такого файла или каталога.
(gdb) bt
#0  0x00007ffff58d3418 in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007ffff58d501a in __GI_abort () at abort.c:89
#2  0x00007ffff5f0c84d in __gnu_cxx::__verbose_terminate_handler() ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007ffff5f0a6b6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007ffff5f0a701 in std::terminate() ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007ffff5f0a919 in __cxa_throw ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007ffff7954136 in ClBlasHelper::Gemm (cl=0x669760, 
    order=order@entry=clblasColumnMajor, aTrans=aTrans@entry=clblasNoTrans, 
    bTrans=bTrans@entry=clblasTrans, m=<optimized out>, k=<optimized out>, 
    n=150, alpha=alpha@entry=1, AWrapper=0x9e8140, aOffset=0, 
    BWrapper=0x97eb90, bOffset=0, beta=beta@entry=0, CWrapper=0xc39b10, 
    cOffset=0)
    at /home/bitummon/Projects/Qt/DeepCL/src/clblas/ClBlasHelper.cpp:40
#7  0x00007ffff7963e33 in BackwardIm2Col::backward (this=0xc4d2c0, 
    batchSize=1, inputDataWrapper=<optimized out>, gradOutputWrapper=0x9e8140, 
    weightsWrapper=0x97eb90, gradInputWrapper=0x9e7930)
    at /home/bitummon/Projects/Qt/DeepCL/src/conv/BackwardIm2Col.cpp:64
#8  0x00007ffff7963392 in BackwardAuto::backward (this=0x97eaa0, 
    batchSize=<optimized out>, inputDataWrapper=<optimized out>, 
---Type <return> to continue, or q <return> to quit---
    gradOutput=<optimized out>, weightsWrapper=<optimized out>, 
    gradInput=<optimized out>)
    at /home/bitummon/Projects/Qt/DeepCL/src/conv/BackwardAuto.cpp:70
#9  0x00007ffff796c143 in ConvolutionalLayer::backward (this=0x994270)
    at /home/bitummon/Projects/Qt/DeepCL/src/conv/ConvolutionalLayer.cpp:434
#10 0x00007ffff7985d6b in NeuralNet::backward (this=this@entry=0x8b0510, 
    outputData=outputData@entry=0x7fffffdd7fe0)
    at /home/bitummon/Projects/Qt/DeepCL/src/net/NeuralNet.cpp:233
#11 0x00007ffff7995997 in SGD::trainNet (this=0x9b80a0, net=0x8b0510, 
    context=<optimized out>, input=0x7fffffdd8130, outputData=0x7fffffdd7fe0)
    at /home/bitummon/Projects/Qt/DeepCL/src/trainers/SGD.cpp:81
#12 0x00007ffff7995bb8 in SGD::trainNetFromLabels (this=0x9b80a0, 
    net=0x8b0510, context=0x7fffffdd8080, input=0x7fffffdd8130, 
    labels=<optimized out>)
    at /home/bitummon/Projects/Qt/DeepCL/src/trainers/SGD.cpp:108
#13 0x00007ffff799669f in Trainer::trainFromLabels (this=0x9b80a0, 
    trainable=<optimized out>, context=0x7fffffdd8080, input=0x7fffffdd8130, 
    labels=0x7fffffffd7d0)
    at /home/bitummon/Projects/Qt/DeepCL/src/trainers/Trainer.cpp:71
#14 0x00007ffff795a2a5 in LearnBatcher::internalTick (this=0x9c0260, 
    epoch=<optimized out>, batchData=0x7fffffdd8130, 
    batchLabels=0x7fffffffd7d0)
    at /home/bitummon/Projects/Qt/DeepCL/src/batch/Batcher.cpp:143
---Type <return> to continue, or q <return> to quit---
#15 0x00007ffff795a3e8 in Batcher::tick (this=0x9c0260, epoch=2)
    at /home/bitummon/Projects/Qt/DeepCL/src/batch/Batcher.cpp:101
#16 0x00007ffff795acdf in NetLearner::tickBatch (this=0x7fffffffd790)
    at /home/bitummon/Projects/Qt/DeepCL/src/batch/NetLearner.cpp:83
#17 0x00007ffff795ac61 in NetLearner::tickEpoch (this=0x7fffffffd790)
    at /home/bitummon/Projects/Qt/DeepCL/src/batch/NetLearner.cpp:126
#18 0x00007ffff795abe9 in NetLearner::run (this=0x7fffffffd790)
    at /home/bitummon/Projects/Qt/DeepCL/src/batch/NetLearner.cpp:135
#19 0x00000000004047c6 in network::network (this=0x7fffffffd880, parent=0x0)
    at ../CNN/network.cpp:65
#20 0x0000000000403a6d in main (argc=1, argv=0x7fffffffd9a8)
    at ../CNN/main.cpp:21

gdb output

heelo! 1.1
heelo! 1.2
heelo!  1.5
layer 0:InputLayer{ outputPlanes=4 outputSize=28 }
layer 1:NormalizationLayer{ outputPlanes=4 outputSize=28 translate=-2 scale=0.5 }
layer 2:ConvolutionalLayer{ LayerDimensions{ inputPlanes=4 inputSize=28 numFilters=8 filterSize=5 outputSize=24 padZeros=0 biased=1 skip=0} }
layer 3:ActivationLayer{ RELU }
layer 4:PoolingLayer{ inputPlanes=8 inputSize=24 poolingSize=2 }
layer 5:ConvolutionalLayer{ LayerDimensions{ inputPlanes=8 inputSize=12 numFilters=16 filterSize=5 outputSize=8 padZeros=0 biased=1 skip=0} }
layer 6:ActivationLayer{ RELU }
layer 7:PoolingLayer{ inputPlanes=16 inputSize=8 poolingSize=3 }
layer 8:FullyConnectedLayer{ numPlanes=150 imageSize=1 }
layer 9:ActivationLayer{ RELU }
layer 10:FullyConnectedLayer{ numPlanes=10 imageSize=1 }
layer 11:ActivationLayer{ LINEAR }
layer 12:SoftMaxLayer{ perPlane=0 numPlanes=10 imageSize=1 }
Parameters overview: (skipping 8 layers with 0 params)
layer 1: params=2   0.0%
layer 2: params=808 5.3%
layer 5: params=3216    21.0%
layer 8: params=9750    63.8%
layer 10: params=1510   9.9%
TOTAL  : params=15286
heelo! 2
187500 2250000
statefultimer v0.7
forward try kernel 0
  ... not plausibly optimal, skipping
forward try kernel 1
   ... seems valid
ForwardAuto: kernel 1 0ms
forward try kernel 0
  ... not plausibly optimal, skipping
forward try kernel 1
   ... seems valid
ForwardAuto: kernel 1 0ms
forward try kernel 0
  ... not plausibly optimal, skipping
forward try kernel 1
   ... seems valid
ForwardAuto: kernel 1 0ms
forward try kernel 0
  ... not plausibly optimal, skipping
forward try kernel 1
   ... seems valid
ForwardAuto: kernel 1 0ms
backward try kernel 0
  ... not plausibly optimal, skipping
backward try kernel 1
   ... seems valid
BackwardAuto: kernel 1 0ms
calcGradWeights try kernel 0
  ... not plausibly optimal, skipping
calcGradWeights try kernel 1
   ... seems valid
BackpropWeightsAuto: kernel 1 0ms
backward try kernel 0
  ... not plausibly optimal, skipping
backward try kernel 1
   ... seems valid
BackwardAuto: kernel 1 0ms
calcGradWeights try kernel 0
  ... not plausibly optimal, skipping
calcGradWeights try kernel 1
   ... seems valid
BackpropWeightsAuto: kernel 1 0ms
backward try kernel 0
  ... not plausibly optimal, skipping
backward try kernel 1
   ... seems valid
BackwardAuto: kernel 1 0ms
calcGradWeights try kernel 0
  ... not plausibly optimal, skipping
calcGradWeights try kernel 1
   ... seems valid
BackpropWeightsAuto: kernel 1 0ms
calcGradWeights try kernel 0
  ... not plausibly optimal, skipping
calcGradWeights try kernel 1
   ... seems valid
BackpropWeightsAuto: kernel 1 0ms

after epoch 1 7 ms
 training loss: 2.69564
 train accuracy: 0/1 0%
forward try kernel 2
   ... seems valid
ForwardAuto: kernel 2 0ms
forward try kernel 2
   ... seems valid
ForwardAuto: kernel 2 0ms
forward try kernel 2
   ... seems valid
ForwardAuto: kernel 2 0ms
forward try kernel 2
   ... seems valid
ForwardAuto: kernel 2 0ms
test accuracy: 1/1 100%
after tests 1 ms
forward try kernel 3
   ... seems valid
ForwardAuto: kernel 3 0ms
forward try kernel 3
   ... seems valid
ForwardAuto: kernel 3 0ms
forward try kernel 3
   ... seems valid
ForwardAuto: kernel 3 0ms
forward try kernel 3
   ... seems valid
ForwardAuto: kernel 3 0ms
backward try kernel 2
   ... seems valid
BackwardAuto: kernel 2 0ms
calcGradWeights try kernel 2
   ... seems valid
BackpropWeightsAuto: kernel 2 0ms
backward try kernel 2
   ... seems valid
BackwardAuto: kernel 2 0ms
calcGradWeights try kernel 2
   ... seems valid
BackpropWeightsAuto: kernel 2 0ms
backward try kernel 2
   ... seems valid
BackwardAuto: kernel 2 0ms
calcGradWeights try kernel 2
   ... seems valid
BackpropWeightsAuto: kernel 2 0ms
calcGradWeights try kernel 2
   ... seems valid
BackpropWeightsAuto: kernel 2 0ms

after epoch 2 7 ms
 training loss: 1.79052
 train accuracy: 1/1 100%
forward try kernel 4
   ... seems valid
ForwardAuto: kernel 4 0ms
forward try kernel 4
   ... seems valid
ForwardAuto: kernel 4 0ms
forward try kernel 4
   ... seems valid
ForwardAuto: kernel 4 0ms
forward try kernel 4
   ... seems valid
ForwardAuto: kernel 4 0ms
test accuracy: 1/1 100%
after tests 1 ms
forward try kernel 5
ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
   ... not valid
forward try kernel 6
   ... seems valid
ForwardAuto: kernel 6 0ms
forward try kernel 5
ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
   ... not valid
forward try kernel 6
   ... seems valid
ForwardAuto: kernel 6 0ms
forward try kernel 5
   ... seems valid
ForwardAuto: kernel 5 0ms
forward try kernel 5
   ... seems valid
ForwardAuto: kernel 5 0ms
backward try kernel 3
   ... seems valid
Didnt initialize clBLAS
terminate called after throwing an instance of 'ClblasNotInitializedException'
hughperkins commented 8 years ago

Thanks. in main.cpp, yo ucurrently have the QQ includes before the DeepCL ones:

#include <QGuiApplication>
#include <QQmlApplicationEngine>

#include "network.h"

Can you reverse the order please?

#include "network.h"

#include <QGuiApplication>
#include <QQmlApplicationEngine>

Also, I'm not sure what is in imagenetbatchinfo.h, but can you put it after the deepcl includes, in network.h, please? :

#ifndef NETWORK_H
#define NETWORK_H
#include "DeepCL.h"

#include<iostream>
#include "imagenetbatchinfo.h"

Also, can you uncomment the clBlasInstance, so I get the stack trace at hte point of the segfault?

blacky-i commented 8 years ago

include "imagenetbatchinfo.h" is an empty class for now. i removed it from includes.

i've reversed the order as you said. doesn't change.

backtrace with uncommented ClBlasInstance():

#0  0x0000000000000000 in ?? ()
#1  0x00007ffff500202d in getPlatforms (
    platforms=platforms@entry=0x7fffffffd798)
    at /home/bitummon/Projects/Qt/DeepCL/clMathLibraries/clBLAS/src/library/tools/tune/toolslib.c:449
#2  0x00007ffff50020cd in initStorageCache ()
    at /home/bitummon/Projects/Qt/DeepCL/clMathLibraries/clBLAS/src/library/tools/tune/toolslib.c:482
#3  0x00007ffff4faf61b in clblasSetup ()
    at /home/bitummon/Projects/Qt/DeepCL/clMathLibraries/clBLAS/src/library/blas/init.c:212
#4  0x0000000000403b18 in main (argc=1, argv=0x7fffffffd9a8)
    at ../CNN/main.cpp:22

gdb output:

Starting program: /home/bitummon/Projects/Qt/build-CNN-Desktop_Qt_5_7_0_GCC_64bit-Debug/CNN 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
QML debugging is enabled. Only use this in a safe environment.
[New Thread 0x7fffecbc1700 (LWP 5560)]
[New Thread 0x7fffe77b6700 (LWP 5561)]

Thread 1 "CNN" received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()

also adding clinfo output:

clinfo: /usr/local/cuda/lib64/libOpenCL.so.1: no version information available (required by clinfo)
Number of platforms                               1
  Platform Name                                   NVIDIA CUDA
  Platform Vendor                                 NVIDIA Corporation
  Platform Version                                OpenCL 1.2 CUDA 8.0.20
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts
  Platform Extensions function suffix             NV

  Platform Name                                   NVIDIA CUDA
Number of devices                                 1
  Device Name                                     GeForce GT 620M
  Device Vendor                                   NVIDIA Corporation
  Device Vendor ID                                0x10de
  Device Version                                  OpenCL 1.1 CUDA
  Driver Version                                  361.42
  Device OpenCL C Version                         OpenCL C 1.1 
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Topology (NV)                            PCI-E, 01:00.0
  Max compute units                               2
  Max clock frequency                             1250MHz
  Compute Capability (NV)                         2.1
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x64
  Max work group size                             1024
  Preferred work group size multiple              32
  Warp size (NV)                                  32
  Preferred / native vector sizes                 
    char                                                 1 / 1       
    short                                                1 / 1       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 0 / 0        (n/a)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Address bits                                    64, Little-Endian
  Global memory size                              1073479680 (1024MiB)
  Error Correction support                        No
  Max memory allocation                           268369920 (255.9MiB)
  Unified memory for Host and Device              No
  Integrated memory (NV)                          No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       4096 bits (512 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        32768
  Global Memory cache line                        128 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                8
  Local memory type                               Local
  Local memory size                               49152 (48KiB)
  Registers per block (NV)                        32768
  Max constant buffer size                        65536 (64KiB)
  Max number of constant args                     9
  Max size of kernel argument                     4352 (4.25KiB)
  Queue properties                                
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Profiling timer resolution                      1000ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Kernel execution timeout (NV)                 Yes
  Concurrent copy and kernel execution (NV)       Yes
    Number of async copy engines                  1
  Device Available                                Yes
  Compiler Available                              Yes
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform
  clCreateContext(NULL, ...) [default]            No platform
  clCreateContext(NULL, ...) [other]              Success [NV]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  No platform
hughperkins commented 8 years ago

Ok. What happens if you put this at the start of your main function? :

        bool clpresent = 0 == clewInit();
        if(!clpresent) {
            throw std::runtime_error("OpenCL library not found");
        }
blacky-i commented 8 years ago

main.cpp:

#include <clew.h>
#include "network.h"

#include <QGuiApplication>
#include <QQmlApplicationEngine>

int main(int argc, char *argv[])
{

    QCoreApplication::setAttribute(Qt::AA_EnableHighDpiScaling);
    QGuiApplication app(argc, argv);

    bool clpresent = 0 == clewInit();
           if(!clpresent) {
               throw std::runtime_error("OpenCL library not found");
           }

    cl_uint numberPlatform;

     auto ret = clGetPlatformIDs(0, NULL, &numberPlatform);
     if (ret != CL_SUCCESS || numberPlatform == 0) {
         return  0;
     }

    ClBlasInstance();
    network();

    QQmlApplicationEngine engine;
    engine.load(QUrl(QLatin1String("qrc:/main.qml")));

    return app.exec();
}

clpresent=true, numberPlatform = 1, ret =0.

ClBlasInstance() now works, but error at netLearner again.

backtrace:

#0  0x00007ffff58d3418 in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007ffff58d501a in __GI_abort () at abort.c:89
#2  0x00007ffff5f0c84d in __gnu_cxx::__verbose_terminate_handler() ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007ffff5f0a6b6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007ffff5f0a701 in std::terminate() ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007ffff5f0a919 in __cxa_throw ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007ffff7b5a136 in ClBlasHelper::Gemm (cl=0x6843a0, 
    order=order@entry=clblasColumnMajor, aTrans=aTrans@entry=clblasNoTrans, 
    bTrans=bTrans@entry=clblasTrans, m=<optimized out>, k=<optimized out>, 
    n=150, alpha=alpha@entry=1, AWrapper=0x9e8460, aOffset=0, 
    BWrapper=0x97eeb0, bOffset=0, beta=beta@entry=0, CWrapper=0xc5ae80, 
    cOffset=0)
    at /home/bitummon/Projects/Qt/DeepCL/src/clblas/ClBlasHelper.cpp:40
#7  0x00007ffff7b69e33 in BackwardIm2Col::backward (this=0xc4b070, 
    batchSize=1, inputDataWrapper=<optimized out>, gradOutputWrapper=0x9e8460, 
    weightsWrapper=0x97eeb0, gradInputWrapper=0x9e7c50)
    at /home/bitummon/Projects/Qt/DeepCL/src/conv/BackwardIm2Col.cpp:64
#8  0x00007ffff7b69392 in BackwardAuto::backward (this=0x97edc0, 
    batchSize=<optimized out>, inputDataWrapper=<optimized out>, 
---Type <return> to continue, or q <return> to quit---
    gradOutput=<optimized out>, weightsWrapper=<optimized out>, 
    gradInput=<optimized out>)
    at /home/bitummon/Projects/Qt/DeepCL/src/conv/BackwardAuto.cpp:70
#9  0x00007ffff7b72143 in ConvolutionalLayer::backward (this=0x994590)
    at /home/bitummon/Projects/Qt/DeepCL/src/conv/ConvolutionalLayer.cpp:434
#10 0x00007ffff7b8bd6b in NeuralNet::backward (this=this@entry=0x8b0830, 
    outputData=outputData@entry=0x7fffffdd7fe0)
    at /home/bitummon/Projects/Qt/DeepCL/src/net/NeuralNet.cpp:233
#11 0x00007ffff7b9b997 in SGD::trainNet (this=0x9b83c0, net=0x8b0830, 
    context=<optimized out>, input=0x7fffffdd8130, outputData=0x7fffffdd7fe0)
    at /home/bitummon/Projects/Qt/DeepCL/src/trainers/SGD.cpp:81
#12 0x00007ffff7b9bbb8 in SGD::trainNetFromLabels (this=0x9b83c0, 
    net=0x8b0830, context=0x7fffffdd8080, input=0x7fffffdd8130, 
    labels=<optimized out>)
    at /home/bitummon/Projects/Qt/DeepCL/src/trainers/SGD.cpp:108
#13 0x00007ffff7b9c69f in Trainer::trainFromLabels (this=0x9b83c0, 
    trainable=<optimized out>, context=0x7fffffdd8080, input=0x7fffffdd8130, 
    labels=0x7fffffffd7d0)
    at /home/bitummon/Projects/Qt/DeepCL/src/trainers/Trainer.cpp:71
#14 0x00007ffff7b602a5 in LearnBatcher::internalTick (this=0x9c0580, 
    epoch=<optimized out>, batchData=0x7fffffdd8130, 
    batchLabels=0x7fffffffd7d0)
    at /home/bitummon/Projects/Qt/DeepCL/src/batch/Batcher.cpp:143
---Type <return> to continue, or q <return> to quit---
#15 0x00007ffff7b603e8 in Batcher::tick (this=0x9c0580, epoch=2)
    at /home/bitummon/Projects/Qt/DeepCL/src/batch/Batcher.cpp:101
#16 0x00007ffff7b60cdf in NetLearner::tickBatch (this=0x7fffffffd790)
    at /home/bitummon/Projects/Qt/DeepCL/src/batch/NetLearner.cpp:83
#17 0x00007ffff7b60c61 in NetLearner::tickEpoch (this=0x7fffffffd790)
    at /home/bitummon/Projects/Qt/DeepCL/src/batch/NetLearner.cpp:126
#18 0x00007ffff7b60be9 in NetLearner::run (this=0x7fffffffd790)
    at /home/bitummon/Projects/Qt/DeepCL/src/batch/NetLearner.cpp:135
#19 0x0000000000404930 in network::network (this=0x7fffffffd880, parent=0x0)
    at ../CNN/network.cpp:65
#20 0x0000000000403bc8 in main (argc=1, argv=0x7fffffffd9a8)
    at ../CNN/main.cpp:29

And now i have strange behavior, if i remove lines you mentioned at https://github.com/hughperkins/DeepCL/issues/94#issuecomment-249112023 , i got segfault at line in main.cpp: auto ret = clGetPlatformIDs(0, NULL, &numberPlatform);

But if I remove #include <clew.h> they are working fine

hughperkins commented 8 years ago

You also need to create an OpenCL context, before instantiating a ClBlasInstance instance;. By the way ClBlasInstance is not really a method, it's an object. Can you put the following at the start of your main, and remove the call to ClBlasInstance?

    EasyCL *cl = 0;
    if(config.gpuIndex >= 0) {
        cl = EasyCL::createForIndexedGpu(config.gpuIndex);
    } else {
        cl = EasyCL::createForFirstGpuOtherwiseCpu();
    }
    ClBlasInstance blasInstance;
hughperkins commented 8 years ago

(and pass the cl object into your network() function too please)

blacky-i commented 8 years ago

Ok, but what is config parameter? compiler do not know what it is.

blacky-i commented 8 years ago

I've added this,

  EasyCL *cl = 0;
        cl = EasyCL::createForFirstGpuOtherwiseCpu();
    ClBlasInstance blasInstance;

and everything works fine!

No crashes at all. So to sum up, for some reason linker cannot find properly opencl library?

hughperkins commented 8 years ago

Ok, so there are a few things:

hughperkins commented 8 years ago

(oh, and finally CLblas library expects an opencl context to have been created, before you initilie it, I'm fairly sure, I cant quite remember how it finds the context though; my understanding is a bit hazy on this last point).

blacky-i commented 8 years ago

Ok, thanks for help very much!

hughperkins commented 8 years ago

Cool :-)