hughperkins / DeepCL

OpenCL library to train deep convolutional neural networks
Mozilla Public License 2.0
867 stars 199 forks source link

Keeping GPU initialized and Memory usage #85

Closed merceyz closed 8 years ago

merceyz commented 8 years ago

Hello again,

For my prediction work i'm calling deepcl_predict on a manifest with 9 images. Then again on 4 new images, then depending on the result of that < 4 images (repeated a few times).

This is getting done a lot which means that the GPU has to be reinitialized and the network recreated every time.

Is there a way to make it persist so it doesn't have to reinitialize every time?

If not a file system watcher would be a option. Start predict as a "server" then have it wait for the manifest file(s) to show up in a specified folder then run the predict on the manifest and output the prediction to another specified file name/location

Also deepcl_predict uses ~2gb of RAM + 1gb on the GPU, don't know if that is normal or not for my network size. netdef=4*(60c5z-relu-mp2)-150n-150n-2n input 96x96x3

hughperkins commented 8 years ago

Hmmm, that seems strange. Will get hold of a Windows box, and try...

hughperkins commented 8 years ago

Ok, here we go. So, I used the same script as earlier, I've just added instructions t othe header:

/*

On Windows 7, tested as follows:

rem Download mnist from http://yann.lecun.com/exdb/mnist/index.html
rem decompress into c:\mnist
rem rename to change the '.' to be a '-', in the filenames
rem then, in a cmd:
call $DEEPCLDIR\dist\bin\activate.sh
deepcl_train datadir=c:\mnist numtrain=1280 numtest=1280
c:\Windows\Microsoft.NET\Framework\v4.0.30319\csc.exe test.cs
test.exe | deepcl_predict batchsize=1 outputformat=text outputfile=out.txt
rem (then in another window, did: type out.txt)

Run as follows (tested on Ubuntu 16.04, using mono):

source $DEEPCLDIR/dist/bin/activate.sh
deepcl_train datadir=/norep/data/mnist/ numtrain=1280 numtest=1280
# this will create weights.dat
mcs test.cs
# creates test.exe
mono test.exe | deepcl_predict batchsize=1 outputformat=text outputfile=/tmp/out.txt

*/

using System;
using System.IO;
using System.Runtime.Serialization;
using System.Runtime.Serialization.Formatters.Binary;

public class HelloWorld
{
    static public void Main ()
    {
        int batchSize = 32;
        int planes = 1;
        int imageSize = 28;
        float[,,,] floats = new float[batchSize, planes, imageSize, imageSize];
        int[] dims = new int[3];
        dims[0] = planes;
        dims[1] = imageSize;
        dims[2] = imageSize;

        using (Stream myOutStream = Console.OpenStandardOutput())
        {
            while(true) {
                for(int i = 0; i < 3; i++) {
                    byte[] bytes = BitConverter.GetBytes(dims[i]);
                    myOutStream.Write(bytes, 0, bytes.Length);
                }
                for(int n = 0; n < batchSize; n++) {
                    for(int p = 0; p < planes; p++) {
                        for(int h = 0; h < imageSize; h++) {
                            for(int w = 0; w < imageSize; w++) {
                                byte[] bytes = BitConverter.GetBytes(floats[n,p,h,w]);
                                myOutStream.Write(bytes, 0, bytes.Length);
                            }
                        }
                    }
                }
                myOutStream.Flush();
                Console.ReadLine();
            }
        }
    }
}

Then, as stated, I ran as folows:

rem Download mnist from http://yann.lecun.com/exdb/mnist/index.html
rem decompress into c:\mnist
rem rename to change the '.' to be a '-', in the filenames
rem then, in a cmd:
call $DEEPCLDIR\dist\bin\activate.sh
deepcl_train datadir=c:\mnist numtrain=1280 numtest=1280
c:\Windows\Microsoft.NET\Framework\v4.0.30319\csc.exe test.cs
test.exe | deepcl_predict batchsize=1 outputformat=text outputfile=out.txt
rem (then in another window, did: type out.txt)

Screeenshots to follow

hughperkins commented 8 years ago

Screenshots:

Run training:

runtraining

Run prediction:

runpredict

deepclout

Example prediction output:

outtxtout

Can you show me how you are compiling and running this? I'm running on Windows 7 64-bit by the way. Not sure how that compares to your own situation?

merceyz commented 8 years ago

In my program i use the Process class to have direct access to the input stream of deepcl_predict so i don't need to use anything else (such as | in CMD) 0021e0f6f1aa0197b9a45e9e4fe78857

To try your test class i copied the code of Main into the Main of a console application and compiled it by hitting build solution. I then started it in cmd with somewhat the same commands as you.

In neither of my situations is activate.bat called

hughperkins commented 8 years ago

Ok, please follow the instructions above, ie:

rem Download mnist from http://yann.lecun.com/exdb/mnist/index.html
rem decompress into c:\mnist
rem rename to change the '.' to be a '-', in the filenames
rem then, in a cmd:
call $DEEPCLDIR\dist\bin\activate.sh
deepcl_train datadir=c:\mnist numtrain=1280 numtest=1280
c:\Windows\Microsoft.NET\Framework\v4.0.30319\csc.exe test.cs
test.exe | deepcl_predict batchsize=1 outputformat=text outputfile=out.txt
rem (then in another window, did: type out.txt)
merceyz commented 8 years ago

Few warnings: 2439b9e583e424ca4e3bb4aff384c115

I removed outputfile=out.txt and ran it which worked. However all predictions/outputs are identical.

hughperkins commented 8 years ago

I now removed outputfile=out.txt and ran it again which also worked. However all predictions/outputs are identical.

Yes, ok, thats becuase we are feeding in 0.0 for all values. Here is an updated test.cs that feeds in random numbers:

/*

On Windows 7, tested as follows:

rem Download mnist from http://yann.lecun.com/exdb/mnist/index.html
rem decompress into c:\mnist
rem rename to change the '.' to be a '-', in the filenames
rem then, in a cmd:
call $DEEPCLDIR\dist\bin\activate.sh
deepcl_train datadir=c:\mnist numtrain=1280 numtest=1280
c:\Windows\Microsoft.NET\Framework\v4.0.30319\csc.exe test.cs
test.exe | deepcl_predict batchsize=1 outputformat=text outputfile=out.txt
rem (then in another window, did: type out.txt)

Run as follows (tested on Ubuntu 16.04, using mono):

source $DEEPCLDIR/dist/bin/activate.sh
deepcl_train datadir=/norep/data/mnist/ numtrain=1280 numtest=1280
# this will create weights.dat
mcs test.cs
# creates test.exe
mono test.exe | deepcl_predict batchsize=1 outputformat=text outputfile=/tmp/out.txt

*/

using System;
using System.IO;
using System.Runtime.Serialization;
using System.Runtime.Serialization.Formatters.Binary;

public class HelloWorld
{
    static public void Main ()
    {
        int batchSize = 32;
        int planes = 1;
        int imageSize = 28;
        float[,,,] floats = new float[batchSize, planes, imageSize, imageSize];
        Random random = new Random();
        for(int n = 0; n < batchSize; n++) {
            for(int p = 0; p < planes; p++) {
                for(int h = 0; h < imageSize; h++) {
                    for(int w = 0; w < imageSize; w++) {
                        floats[n,p,h,w] = (float)random.NextDouble();
                    }
                }
            }
        }
        int[] dims = new int[3];
        dims[0] = planes;
        dims[1] = imageSize;
        dims[2] = imageSize;

        using (Stream myOutStream = Console.OpenStandardOutput())
        {
            while(true) {
                for(int i = 0; i < 3; i++) {
                    byte[] bytes = BitConverter.GetBytes(dims[i]);
                    myOutStream.Write(bytes, 0, bytes.Length);
                }
                for(int n = 0; n < batchSize; n++) {
                    for(int p = 0; p < planes; p++) {
                        for(int h = 0; h < imageSize; h++) {
                            for(int w = 0; w < imageSize; w++) {
                                byte[] bytes = BitConverter.GetBytes(floats[n,p,h,w]);
                                myOutStream.Write(bytes, 0, bytes.Length);
                            }
                        }
                    }
                }
                myOutStream.Flush();
                Console.ReadLine();
            }
        }
    }
}

Example out.txt:

0.0482792 0.182036 0.0903789 0.07572 0.108117 0.13508 0.105968 0.0957547 0.0724439 0.0862233
0.0483042 0.1822 0.0903414 0.0758145 0.108056 0.134971 0.105971 0.095722 0.0724505 0.08617
0.0482997 0.18206 0.0904025 0.0758086 0.108094 0.134941 0.105954 0.0957871 0.072454 0.086199
0.0482824 0.182142 0.0903218 0.0758162 0.108041 0.135031 0.105992 0.0957432 0.0724605 0.0861704
0.0482508 0.18218
...

The numbers are similar, but not identical.

merceyz commented 8 years ago

Yeah, that did the trick... now the question is why doesn't my approach work..

hughperkins commented 8 years ago

Yeah, that did the trick...

Cool :-)

now the question is why doesn't my approach work..

I'm not sure. You could start by pasting my code into your code perhaps, and check that does/doesnt work. ie, is the problem with the data you are writing to stdout? Or is the problem with the Process command? I dont know :-)

hughperkins commented 8 years ago

(You could also try the inverse: ie use the Process command to run test.exe. If this works, then the issue is probably with what you are writing to stdout, if it doesnt work, then there's something not quite the same in the way you are launching the process, ie the Process command parameters). I'd also think you should be calling activate.bat, somewhere, eg before running your main process.

hughperkins commented 8 years ago

Seems like in the C++, I created a verbose option:

    if(config.gpuIndex >= 0) {
        cl = EasyCL::createForIndexedGpu(config.gpuIndex, verbose);
    } else {
        cl = EasyCL::createForFirstGpuOtherwiseCpu(verbose);
    }

I'll see if I can make this available to the python somehow.

merceyz commented 8 years ago

I'm not sure. You could start by pasting my code into your code perhaps, and check that does/doesnt work. ie, is the problem with the data you are writing to stdout? Or is the problem with the Process command? I dont know :-)

I'm sending it directly to the input of deepcl_predict, i'm starting to think it might be that it closes the stream or something. Could you add some debug code so i can see where/when it exits.. well it dies right after getting the first 3 values (planes, width, height) but the question is why

hughperkins commented 8 years ago

As long as you give a value to outputfile=, ie dont just leave it blank, then deepcl_predict will stream spammy stuff about what it's doing into stdout. I know you dont want to activate this for production, but this will do what you want during debugging.

merceyz commented 8 years ago

It stops at inputFile: '' with ~60mb of ram then it dies. Output file is empty and it never seems to have initialized propperly as when it does it takes up way more RAM.

Arguments: weightsfile="C:\weights.dat" batchsize=1 outputformat=text outputfile="C:\Output.txt"

hughperkins commented 8 years ago

I'm sending it directly to the input of deepcl_predict, i'm starting to think it might be that it closes the stream or something.

Seems plausible.'

I imagine if you look deeply into how Process works, there should be some options to control this behavior.

On the whole, given the level of customization that you're looking for, it seems like you probably might want to consider creating eg a Python webservice, eg using Flask, which wraps deepcl, since deepcl has bindings for python. You'd need to research a bit how to get flask working, have a thin wrapper around deepcl, in the python, to make it behave how you want, and then everything else would be on the c# side, which would simply call the webservice. I'm not saying it's the best architecture, but it's not entirely unstandard, and it would give you the ability to customize the deepcl behavior more closely than calling hte commandline tool. And it wouldnt involve your needing to write c# bindings either.

Alternativley, you could write/modify/hack the predict.cpp file, to behave as you like. it is at src/main/predict.cpp. You'd need to roll up your sleeves, and get stuck into some C++ coding though...

(Edited with correct link to predict.cpp)

hughperkins commented 8 years ago

(Hmmm, I'm fairly sure you can call into c++ directly from c# right? maybe that would be the way to go perhaps? You might need to add some option to build it as Managed code perhaps?)

merceyz commented 8 years ago

(Hmmm, I'm fairly sure you can call into c++ directly from c# right? maybe that would be the way to go perhaps? You might need to add some option to build it as Managed code perhaps?)

That is possible yes, using dllimport to call a function. I think the function has to be static though

merceyz commented 8 years ago

I can try to make changes to predict.cpp just need to know how to build it...

hughperkins commented 8 years ago

I can try to make changes to predict.cpp just need to know how to build it...

Ok. So... the first thing is you're going to need to install Microsoft Visual Studio 2010, eg if it is for non-commercial usage there is Visual Studio 2010 express. Hmmm.... seems no download page for this. You might need a more recent version, ie whatever is available. 2013 used to work ok. I never tried anything more recent than 2013. https://www.visualstudio.com/en-us/downloads/download-visual-studio-vs.aspx

You'll also need:

On the whole, I guess I'd rate the amount of effort of vairous options like:

hughperkins commented 8 years ago

(probably should use 2015, since that is the compiler for python 3.5 https://docs.python.org/devguide/setup.html )

merceyz commented 8 years ago

I decided to extend upon your example and used some process communication code i had from previous projects to send test.exe the image data.

So my program starts cmd and runs the commands as per your example then instead of console.readline it sits and waits for a "job" to be sent to it, processes and loops back

It works, it initializes using the same code as in your example... it uses way less memory than before, don't know if you changed anything for that but i'm not complaining... then when it comes to predicting the "batch" i'm getting weird results.

In one instance deepcl_predict closed instantly. In another it predicted (correctly) one image then closed. In the third it predicted two images then closed.

It was always sent 9 images.

So... more logging as to why it closes?

merceyz commented 8 years ago

Here is your code slightly edited to make it crash.

using System;
using System.IO;
using System.Runtime.Serialization;
using System.Runtime.Serialization.Formatters.Binary;

public class HelloWorld
{
    static public void Main ()
    {
        int batchSize = 32;
        int planes = 1;
        int imageSize = 28;
        float[,,,] floats = new float[batchSize, planes, imageSize, imageSize];
        int[] dims = new int[3];
        dims[0] = planes;
        dims[1] = imageSize;
        dims[2] = imageSize;

        Random ran = new Random();
        using (Stream myOutStream = Console.OpenStandardOutput())
        {
            while(true) {
                for(int i = 0; i < 3; i++) {
                    byte[] bytes = BitConverter.GetBytes(dims[i]);
                    myOutStream.Write(bytes, 0, bytes.Length);
                }
                for(int n = 0; n < batchSize; n++) {
                    for(int p = 0; p < planes; p++) {
                        for(int h = 0; h < imageSize; h++) {
                            for(int w = 0; w < imageSize; w++) {
                                byte[] bytes = BitConverter.GetBytes((float)ran.NextDouble());
                                myOutStream.Write(bytes, 0, bytes.Length);
                            }
                        }
                    }
                }
                myOutStream.Flush();
                Console.ReadLine();
            }
        }
    }
}
merceyz commented 8 years ago
for(int i = 0; i < 3; i++) {
    byte[] bytes = BitConverter.GetBytes(dims[i]);
    myOutStream.Write(bytes, 0, bytes.Length);
}

Also i had a look at predict.cpp and this only has to be sent once right? As in the while(more) loop it never updates those again

hughperkins commented 8 years ago

Ah, right. Yes, just write this once, and then after that, just write minibatches of images. right :-)

hughperkins commented 8 years ago

So, moving this out of the loop, fixes the problem?

merceyz commented 8 years ago

Sadly no, but i noticed that if every number in batch1 is the same as the ones in batch2 it works. Thus your example can run, as it uses the same numbers over and over again. Add random to the inputs and it crashes.

I think it might be that it reads from the input stream before it actually has all data, so .eof() might return true before it should though then it shouldn't have worked no matter what

hughperkins commented 8 years ago

Sadly no, but i noticed that if every number in batch1 is the same as the ones in batch2 it works.

That sounds odd. I cant imagine how the numbers themselves would affect anything, they just affect the output numbers, not the codepath. What do you mean concretely by 'Add random to the inputs'? Can you post a modification to my test code that demonstrates this?

merceyz commented 8 years ago

Crashes:

Random ran = new Random();
while (true)
{
    for (int i = 0; i < (27648 * 9); i++) // 96x96x3x9
    {
        myOutStream.Write(BitConverter.GetBytes((float)ran.NextDouble()), 0, 4);
    }

    myOutStream.Flush();
}

Doesn't crash:

Random ran = new Random();
float number = (float)ran.NextDouble();
while (true)
{                    
    for (int i = 0; i < (27648 * 9); i++) // 96x96x3x9
    {
        myOutStream.Write(BitConverter.GetBytes(number), 0, 4);
    }

    myOutStream.Flush();
}
merceyz commented 8 years ago

What do you mean concretely by 'Add random to the inputs'? Can you post a modification to my test code that demonstrates this?

https://github.com/hughperkins/DeepCL/issues/85#issuecomment-236773553

hughperkins commented 8 years ago

This could be a timing issue, as you allude to. What happens if you try:

Random ran = new Random();
float sum = 0;
int it = 0;
while (it < 100000)  // if the loop never terminates, sum can be optimized away
{
    for (int i = 0; i < (27648 * 9); i++) // 96x96x3x9
    {
        sum += (float)ran.NextDouble());  // takes time...
        myOutStream.Write(BitConverter.GetBytes(123.0f, 0, 4);  // just write a constant
    }

    myOutStream.Flush();
    it += 1;
}
Console.WriteLine("sum " + sum);  // so sum is not optimized away
hughperkins commented 8 years ago

85 (comment)

This is incorrect ,as you say, since it writes the header multiple times. should only write the 3 header ints once, right at the start.

merceyz commented 8 years ago

This could be a timing issue, as you allude to. What happens if you try:

Runs fine, doesn't crash it

This is incorrect ,as you say, since it writes the header multiple times. should only write the 3 header ints once, right at the start.

Yeah, that's from before i noticed that

hughperkins commented 8 years ago

Ok. Please could you post your current code, that you feel is correct, but crashes.

merceyz commented 8 years ago

I'm assuming you mean the code for "test.exe"

using (Stream myOutStream = Console.OpenStandardOutput())
{
    #region Initialize network
    for (int i = 0; i < 3; i++)
    {
        myOutStream.Write(BitConverter.GetBytes(3), 0, 4);
        myOutStream.Write(BitConverter.GetBytes(96), 0, 4);
        myOutStream.Write(BitConverter.GetBytes(96), 0, 4);
    }

    Random ran = new Random();
    float number = (float)ran.NextDouble();
    for (int i = 0; i < (27648 * 9); i++) // 96x96x3x9
    {
        myOutStream.Write(BitConverter.GetBytes(number), 0, 4);
    }

    myOutStream.Flush();

    client.SendString("Initialized");
    #endregion

    while (true)
    {
        client.SendString("Waiting for image");
        byte[] imageData = client.GetNextMessage();

        if (imageData != null)
        {
            client.SendString("Processing image");

            for (int i = 0; i < 9; i++)
            {
                myOutStream.Write(imageData, (i * 110592), 110592);
                myOutStream.Flush();
            }

            client.SendString("Processed");
        }
        else return;
    }
}
hughperkins commented 8 years ago

Ok, pleaes remov ethe client.sendString. it will crash the program :-P

hughperkins commented 8 years ago

actually, pleas remove all client.sendStrings.

merceyz commented 8 years ago

Removed, however client.GetNextMessage() is still there, this is it getting the image over a pipe so i guess replace it with a byte[] of 96x96x3x4

hughperkins commented 8 years ago

ah, client is input from another process. and mOutStream is output to the deepcl process? Alright I misunderstood. I thought this meant, you were doing console.writeline, to the output stream, going to deepcl process.

Ok, writing to client should not be an issue... re-reading...

merceyz commented 8 years ago

Yeah, that's correct ^^ Client is a pipe going from test.exe to my program

hughperkins commented 8 years ago

Can you send the size of imagedata back to the client, and get the client to print it please?

merceyz commented 8 years ago

995328, which is 96x96x3x4x9

My pipe code is working fine in case it's that you were hinting at ;)

hughperkins commented 8 years ago

what do you mean by" 995328 is 9696349"?

hughperkins commented 8 years ago

But yeah, seems consistent with the loop. so, its not that...

hughperkins commented 8 years ago

ah, I see in the email, you added * signs, but they get formatted out. Ok.

merceyz commented 8 years ago

I noticed they got formatted as well ^^

But yeah, seems consistent with the loop. so, its not that...

The loop wasn't really needed so i removed it and just made it call .Write(imageData, 0, imageData.Length) instead though both result in the same outcome

hughperkins commented 8 years ago

So, anyway, what I would tend to do is take steps in between the working code, and the non working code. It's like that game where you guess a number from 1 to 100. Player 1 guesses a number, player 2 says 'higher', 'lower', or 'you win'. optimal strategy is player 1 chooses the number in the middle, ie 50, then continues to bisect the range.

Anyway, so, along these lines, lets say the current program is the '100'. we need a '0', ie a working program. If you replace your current block with


      byte[] imageData = new byte[110592];
    while (true)
    {
//        client.SendString("Waiting for image");

   //     if (imageData != null)
     //   {
       //     client.SendString("Processing image");

           // for (int i = 0; i < 9; i++)
           // {
                myOutStream.Write(imageData, 0, 110592);
                myOutStream.Flush();
            //}

    //        client.SendString("Processed");
    //    }
    }

... does this work?

merceyz commented 8 years ago

It does, yeah

hughperkins commented 8 years ago

Ok, so then let's uncomment/comment stuff, and find out the one single line that if we comment/uncomment changes from crashing/uncrashing.

What if you try:


      byte[] imageData = new byte[110592];
    while (true)
    {
        client.SendString("Waiting for image");

   //     if (imageData != null)
     //   {
            client.SendString("Processing image");

           // for (int i = 0; i < 9; i++)
           // {
                myOutStream.Write(imageData, 0, 110592);
                myOutStream.Flush();
            //}

            client.SendString("Processed");
    //    }
    }

(but wihtout the client sending any image, just receiving the messages)

merceyz commented 8 years ago

Also works

hughperkins commented 8 years ago

Interesting. Ok, lets fetch the imagedata from the client, but just once, outside of the loop:

//      byte[] imageData = new byte[110592];
byte[] imageData = client.GetNextMessage();
        client.SendString("size image data " + imageData.Length);
    while (true)
    {
        client.SendString("Waiting for image");

   //     if (imageData != null)
     //   {
            client.SendString("Processing image");

           // for (int i = 0; i < 9; i++)
           // {
                myOutStream.Write(imageData, 0, 110592);
                myOutStream.Flush();
            //}

            client.SendString("Processed");
    //    }
    }